DETAILED ACTION
This communication is in response to the amendment filed 8/29/22 in which claim 1 was amended. Claims 1-20 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 8/29/22 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5 and 7 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Pandian (US 2005/0289182 A1; published Dec. 29, 2005).
Regarding claim 1, Pandian discloses [a] method to perform data extraction for document processing, the method comprising 
configuring one of a plurality of templates by defining a plurality of fields to be extracted and a position of each of the defined plurality of fields; (see paragraphs 122 (user specifies data to find, the user defines master zone templates and builds a template library which the optical character recognition module uses to process and extract data from structured images), 125 (when a user defines a zone, the user specifies the properties of the data he or she wants to extract: a type of data (integer, decimal, alphanumeric, date), or a data-input format (check box, radio button, image, table), 129 (the user can define clues instructing the extraction engine to look for a dictionary entry in a specific place on the image (for example, in the top left-hand corner of the page))
identifying a document type using the plurality of templates; (see paragraphs 141 (a template library for structured forms is utilized by the enhanced image module to enables identifying structured forms; a form template is accessed and compared with an input image to identify that the input image has a structure which is known, for example, to be a Bank of America account statement; such identification occurs by identifying a particular logo in an area of an input image by comparison with a template))
processing the document with one of the configured plurality of templates associated with the identified document type; (see paragraphs 141 (the identification of a particular structured form from the template library is utilized to determine the appropriate image enhancement and/or OCR techniques to be used), 142 (the processing includes dictionary-entry extraction to identify key fields in the input image))
extracting data from the document using the defined plurality of fields from the one of the configured plurality of templates associated with the document to be extracted and the position of the data within each of the plurality of fields (see paragraph 142 (describing dictionary-entry extraction to identify key fields in the input image and verification of extracted data and generation of indexed and collated documents preferably in a standard format such as XML), 145 (template is located and operations are performed in which data is extracted from user-specified zones identified in the template file)). 

Regarding claim 2, Pandian discloses the invention of claim 1 as discussed above. Pandian further discloses improving the accuracy of the performed data extraction by enhancing the one of the plurality of templates with feedback from a validation of extracted data (see paragraphs 11 (critical content extracted is validated), 83 (the data extraction module executes validation scripts to check the data against an external source), 120 (quality assurance/verifier allows the user to verify and correct the extracted XML output), 125 (the user can specify the type of validation the application will do on fields once the data is extracted), 173 (input images in the invalid images folder may be subject of further review either by a manual study of the input image file or an automated invalid image analysis; if the input image is from a known entity or document type, the appropriate corrective action is readily determined and taken to correct the existing image data problem)).

Regarding claim 3, Pandian discloses the invention of claim 1 as discussed above. Pandian further discloses wherein the templates include field level settings (see paragraph 125).

Regarding claim 4, Pandian discloses the invention of claim 3 as discussed above. Pandian further discloses wherein the field level settings include anchors (see paragraph 125).

Regarding claim 5, Pandian discloses the invention of claim 1 as discussed above. Pandian further discloses defining a general taxonomy (see paragraph 122 (discussing a master zone template)).

Regarding claim 7, Pandian discloses the invention of claim 5 as discussed above. Pandian further discloses storing the defined general taxonomy (see paragraph 122 (discussing a master template library)).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Pandian as applied to claim 1 above, and further in view of Gajera (US 10,769,427 B1; published Sep. 8, 2020).
Regarding claim 9, Pandian discloses the invention of claim 1 as discussed above. Pandian does not disclose executing a digitization activity in an RPA workflow. However, Gajera teaches an RPA system that operates to automatically identify fields provided by an application. See column 4, lines 1-14. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to incorporate the teachings of Gajera to perform the data extraction from electronic documents as diverse as facsimile images, scanned images, and images from document management systems, at least because doing so would enable the automatic extraction of data from a structured document without manual intervention.

Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Pandian as applied to claim 1 above, and further in view of Schmidtler (US 2008/0086432 A1; published Apr. 10, 2008).
Regarding claim 13, Pandian discloses the invention of claim 1 as discussed above. Pandian does not expressly disclose classifying into one or more document types using one or more classifiers. However, Schmidtler teaches a method for classifying unlabeled documents using a trained classifier. See paragraph 7. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to incorporate the teachings of Schmidtler to employ a trained classifier to classify the structured documents, at least because doing so would enable adaptively learning to classify documents in relation to drifting classification concepts without manual effort. Schmidtler, paragraph 43.

Regarding claim 14, Pandian, in view of Schmidtler, discloses the invention of claim 13 as discussed above. Pandian does not expressly disclose outputting the classification information. However, Schmidtler teaches a method for classifying unlabeled documents using a trained classifier. See paragraph 7. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to incorporate the teachings of Schmidtler to employ a trained classifier to classify the structured documents, at least because doing so would enable adaptively learning to classify documents in relation to drifting classification concepts without manual effort. Schmidtler, paragraph 43.

Claims 6 and 8 are rejected under 35 U.S.C. 103 as being unpatentable over Pandian as applied to claim 5 above, and further in view of Jones (US 2004/0162833 A1; published Aug. 19, 2004).
Regarding claim 6, Pandian discloses the invention of claim 5 as discussed above. Pandian does not specifically disclose wherein the defining includes: providing an interface facilitating the defining of a list of document types targeted for classification and data extraction with associated fields for each of the document types; and receiving the list of defined document types and the associated fields for each of the defined document types. However, Jones teaches structuring a document with one or more elements and associating the document with particular tables within a document library. See paragraphs 6, 7. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to associate the image file with one or more elements and a table within a document library. Doing so would synchronize the form with the elements of the document as structured in the database. 

Regarding claim 8, Pandian discloses the invention of claim 7 as discussed above. Pandian does not disclose wherein the storing includes storing a list of document types and associated fields in a master taxonomy data structure. However, Jones teaches structuring a document with one or more elements and associating the document with particular tables within a document library. See paragraphs 6, 7. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to associate the image file with one or more elements and a table within a document library. Doing so would synchronize the form with the elements of the document as structured in the database.

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Pandian as applied to claim 9 above, and further in view of Kumar (US 10,489,682 B1; published Nov. 26, 2019).
Regarding claim 10, Pandian discloses the invention of claim 9 as discussed above. Pandian does not disclose wherein the digitization activity uses a plurality of OCR engines and the digitization activity includes implementing a voting system for the plurality of OCR engines and outputting a best combined result from the plurality of OCR engines. However, Kumar teaches an OCR system employing a deep learning system that is trained to process a plurality of images representing text. Abstract. The deep learning system is trained with training data generated from a corpus of real life text segments that are generated by a plurality of OCR modules. Id. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to employ deep learning technology to enhance the text detection module. Doing so would enable more accurate text detection because the deep learning system is trained on more disparate, wider and different statistics of text images as compared to a particular domain of interest. Kumar, column 3, lines 1-3.

Claims 11 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Pandian as applied to claim 9 above, and further in view of Ahern (US 2007/0118538 A1; published May 24, 2007).
Regarding claim 11, Pandian discloses the invention of claim 9 as discussed above. Pandian does not specifically disclose outputting a text version of a file and a Document Object Model (DOM) stored in a DOM data structure. However, Ahern teaches generating a data graph for an external data model for a forms based application and transforming the data graph into a DOM representation of the data graph. See paragraph 9. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to generate a data graph and a DOM representation for a forms based application. Doing so would enable forms processing through a DOM accessible API. Ahern, paragraph 7.

Regarding claim 12, Pandian, in view of Ahern, discloses the invention of claim 11 as discussed above. Pandian does not specifically disclose wherein the DOM includes information pertaining to typed sections, typed word groups, and word level information in the file that are enhanced with rotation, skew, and relative width and height information. However, Ahern teaches generating a data graph for an external data model for a forms based application and transforming the data graph into a DOM representation of the data graph. See paragraph 9. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to generate a data graph and a DOM representation for a forms based application. Doing so would enable forms processing through a DOM accessible API. Ahern, paragraph 7.

Claims 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Pandian as applied to claim 1 above, and further in view of Ghatage (US 2020/0234183 A1; published Jul. 23, 2020).
Regarding claim 15, Pandian discloses the invention of claim 1 as discussed above. Pandian does not specifically disclose executing a classification validation module providing an interface for reviewing, correcting, and/or manually processing files for classification is executed and outputting the classification information. However, Ghatage teaches training AI based models to identify and classify documents, including receiving a document, identifying a type of document, extracting entities and their corresponding name-value pairs, mapping the name-value pairs to an ontology, and generating a report displaying the mappings. See paragraphs 17, 43. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to display results of the image classification. Doing so would enable manual validation of the mappings. Ghatage, paragraph 19. 

Regarding claim 16, Pandian discloses the invention of claim 1 as discussed above. Pandian does not specifically disclose completing a feedback loop for executed classifiers. However, Ghatage teaches training AI based models to identify and classify documents, including receiving a document, identifying a type of document, extracting entities and their corresponding name-value pairs, mapping the name-value pairs to an ontology, and generating a report displaying the mappings. See paragraphs 17, 43. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to display results of the image classification. Doing so would enable manual validation of the mappings. Ghatage, paragraph 19.

Regarding claim 17, Pandian discloses the invention of claim 1 as discussed above. Pandian does not specifically disclose extracting from the document using one or more extractors and outputting the extracted data. However, Ghatage teaches training AI based models to identify and classify documents, including receiving a document, identifying a type of document, extracting entities and their corresponding name-value pairs, mapping the name-value pairs to an ontology, and generating a report displaying the mappings. See paragraphs 17, 43. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to display results of the image classification. Doing so would enable manual validation of the mappings. Ghatage, paragraph 19.

Regarding claim 18, Pandian discloses the invention of claim 1 as discussed above. Pandian does not specifically disclose executing a data extraction validation module providing an interface for correcting, and/or manually processing data points from the data extraction and outputting confirmed extracted data. However, Ghatage teaches training AI based models to identify and classify documents, including receiving a document, identifying a type of document, extracting entities and their corresponding name-value pairs, mapping the name-value pairs to an ontology, and generating a report displaying the mappings. See paragraphs 17, 43. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to display results of the image classification. Doing so would enable manual validation of the mappings. Ghatage, paragraph 19.

Regarding claim 19, Pandian discloses the invention of claim 1 as discussed above. Pandian does not specifically disclose executing an extractor training module that facilitates completion of a feedback loop for one or more extractors. However, Ghatage teaches training AI based models to identify and classify documents, including receiving a document, identifying a type of document, extracting entities and their corresponding name-value pairs, mapping the name-value pairs to an ontology, and generating a report displaying the mappings. See paragraphs 17, 43. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to display results of the image classification. Doing so would enable manual validation of the mappings. Ghatage, paragraph 19.

Regarding claim 20, Pandian discloses the invention of claim 1 as discussed above. Pandian does not specifically disclose exporting the processed data including classification results and extracted information. However, Ghatage teaches training AI based models to identify and classify documents, including receiving a document, identifying a type of document, extracting entities and their corresponding name-value pairs, mapping the name-value pairs to an ontology, and generating a report displaying the mappings. See paragraphs 17, 43. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Pandian to display results of the image classification. Doing so would enable manual validation of the mappings. Ghatage, paragraph 19.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID K KHAN whose telephone number is (571)270-0419. The examiner can normally be reached M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on (571)272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHAHID K KHAN/Examiner, Art Unit 2178