EXAMINER'S AMENDMENT

An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given via communication with Michael W. Farn on 22 February 2022.

The application has been amended as follows: 
1.	(Currently amended) A method implemented on a computer system executing instructions for processing documents, the method comprising:
processing a document set that contains a plurality of documents to identify chunks in the documents and to generate corresponding annotations, comprising stages of:
processing images of the documents to identify visual chunks that comprise visually distinct regions of the images of the documents; and generating first annotations that specify spacing and formatting of the visual chunks, the first annotations including signatures of bitmap tiles of the images of the documents;
processing the visual chunks and first annotations to identify structural chunks that contain content from structures within the visual chunks; 
processing the structural chunks and second annotations to identify topic-level chunks based on a grouping of content in structural chunks according to topic; and generating third annotations that specify topics of the topic-level chunks; and
processing the topic-level chunks and third annotations to identify field chunks comprising content suitable for use as fields in document templates; and generating fourth annotations that specify the fields of the field chunks; and
wherein at least some of the processing is performed in parallel by the computer system to identify the chunks and annotations; and
arbitrating among the identified chunks to produce a well-formed hierarchy of chunks, wherein arbitrating among the chunks comprises modifying some chunks to avoid overlap of the chunks within the well-formed hierarchy; and
generating representations of the processed documents in a format comprising the well-formed hierarchy of chunks that includes the field chunks and at least some of the other identified chunks from the documents and corresponding annotations for the chunks; and
making the representations in the format available for use by any of a plurality of software applications in downstream processes.

3.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein each of the stages of processing the documents uses machine learning, artificial intelligence and/or natural language processing.
4.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein each of the stages of processing the documents identifies chunks with less than 100% confidence.
5.	(Original) The computer-implemented method of claim 4, wherein the representations of the processed documents further comprise annotations specifying confidence levels for the identification of chunks.
6.	(Original) The computer-implemented method of claim 4, further comprising:
receiving user corrections for incorrectly identified chunks; and
improving the stages of automatically identifying chunks in response to the user corrections.
7.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein the stages of processing visual chunks, processing structural chunks and processing topic-level chunks is performed recursively for visual chunks contained within other visual chunks.

9.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein some higher-level chunks contain other lower-level chunks as sub-chunks, and the representations of the processed documents further comprise annotations specifying containment of lower-level chunks in higher-level chunks.
10.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein some chunks have a hierarchical relationship, and the representations of the processed documents further comprise annotations specifying hierarchical relationships between chunks.
11.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein the chunks in the representations of the processed documents comprise a plurality of sections, headings, lists, items, markers, and/or Named Entities at multiple different levels.
12.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein the plurality of documents in the document set are all a same document type.
13.	(Currently amended) The computer-implemented method of claim 22 [[1]], further comprising:
assembling the document set by clustering documents into the document set based on similarity of content and/or layout.

15.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein the representations of the processed documents further comprise annotations for locations of chunks implemented using digital signatures.
16.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein the documents have original layouts, and the representations of the processed documents contain sufficient information to reconstruct the documents with the original layouts.
17.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein the plurality of software applications comprise software applications with a user interface for a user to create, edit and/or review the representations of the processed documents.
18.	(Currently amended) The computer-implemented method of claim 22 [[1]], wherein the format is a standardized, published format.
19-20.	(Cancelled)
21.  	(Currently amended) A method implemented on a computer system executing instructions for processing documents, the method comprising:
processing a document set that contains a plurality of documents to identify chunks in the documents and to generate corresponding annotations, comprising stages of:
, the first annotations including signatures of bitmap tiles of the images of the documents;
processing the visual chunks and first annotations to identify structural chunks that contain content from structures within the visual chunks; and generating second annotations that specify layout of the structural chunks, the structural chunks including hyperlines of text; 
generating signatures of the hyperlines within the structural chunks and renesting the structural chunks based on the signatures to create a nested structure of structural chunks; and
processing the nested structure of structural chunks and second annotations to identify topic-level chunks based on a grouping of content in structural chunks according to topic; and generating third annotations that specify topics of the topic-level chunks; and
processing the topic-level chunks and third annotations to identify field chunks comprising content suitable for use as fields in document templates; and generating fourth annotations that specify the fields of the field chunks;

making the representations in the format available for use by any of a plurality of software applications in downstream processes.
22.	(Previously Presented) A method implemented on a computer system executing instructions for processing documents, the method comprising:
processing a document set that contains a plurality of documents to identify chunks in the documents and to generate corresponding annotations, comprising stages of:
processing images of the documents to identify visual chunks that comprise visually distinct regions of the images of the documents; and generating first annotations that specify spacing and formatting of the visual chunks, the first annotations including signatures of bitmap tiles of the images of the documents;
processing the visual chunks and first annotations to identify structural chunks that contain content from structures within the visual chunks; and generating second annotations that specify layout of the structural chunks; 
processing the structural chunks and second annotations to identify topic-level chunks based on a grouping of content in structural chunks 
processing the topic-level chunks and third annotations to identify field chunks comprising content suitable for use as fields in document templates; and generating fourth annotations that specify the fields of the field chunks;
generating representations of the processed documents in a format comprising the field chunks and at least some of the other identified chunks from the documents and corresponding annotations for the chunks; and
making the representations in the format available for use by any of a plurality of software applications in downstream processes.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher D Kells whose telephone number is (571)270-7729. The examiner can normally be reached Monday - Friday, 8:00 a.m. - 4:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Asher D. Kells
Primary Examiner
Art Unit 2144



/Asher D Kells/Primary Examiner, Art Unit 2144