DETAILED ACTION
This communication is in response to the application filed 1/28/20 in which claims 1-20 were presented for examination.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter
Claims 1-4 are allowed.
Claims 9, 10, 17, and 18 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 5-8, and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Mitchell (US 2010/0161616 A1; published Jun. 24, 2010) in view of Meier (US 2014/0369602 A1; published Dec. 18, 2014).
Regarding claim 5, Mitchell discloses [a] system for format-agnostic document ingestion, comprising: 
a document ingestion server comprising a processor and a memory, the document ingestion server communicatively coupled to a database, the processor configured to: (see paragraphs 13, 41)
receive an image of a document, the document comprising text arranged in an unknown format; (see paragraph 26 (the system receives image data))
convert, using optical character recognition, the image of the document into a plurality of text elements, each text element comprising a content, a size, and an absolute position within the document; (see paragraphs 26 (the system extracts indexing data (metadata) from the unstructured content such as by way of optical character recognition of the image data))
retrieve a plurality of data detectors from the database, each data detector associated with a data type that is anticipated to be in the document, each data detector comprising at least one identifier that is a potential label, at least one direction describing a potential relative direction of a text element having a label associated with the data detector, and at least one validation criteria, wherein each validation criteria describes a valid format; (see paragraphs 32 (the processing engine accesses the ERP schemas and associated data))
identify a potential descriptor by comparing the content of each text element with the at least one identifier of at least one data detector; (see paragraphs 29 (the processing engine compares document indexing data or metadata captured by the CC system by OCR against ERP database records for validity))
determine if the text element pointed to by one of the at least one direction of the data detector used to identify the potential descriptor meets the validation criteria of the data detector; (see paragraphs 26 (the processing engine determines matches as well as mismatches or invalid data found in the indexing data when compared to matching structured content records in the ERP database))
associate the validated text element with the data detector; and (see paragraphs 26 (the processing engine informs the CC system of any matches), 27 (based on the results of the validation, the indexing data is modified to synchronize the indexing data associated with the unstructured content))
store, for each text element associated with one data detector of the plurality of data detector, the content of the text element, in the database (see paragraphs 27 (the unstructured content record is stored with the indexing data)).
Although Mitchell teaches obtaining metadata and indexing data from the image data by optical character recognition, Mitchell does not expressly disclose convert, using optical character recognition, the image of the document into a plurality of text elements, each text element comprising a content, a size, and an absolute position within the document. However, Meier teaches a method to select and extract data among optical character recognition returned strings to automatically process documents where the OCR data includes a string location, its length and content. See paragraph 31.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mitchell to obtain content size and positional information. Doing so would enable processing textual data having spatial characteristics, e.g., tabular data. 
Claim 13 is a method claim corresponding to claim 5 and is similarly rejected.

Regarding claim 6, Mitchell, in view of Meier, discloses the invention of claim 5 as discussed above. Mitchell further discloses wherein the processor is further configured to identify a document type by searching the content of each text element for a plurality of distinguishing strings, each distinguishing string being unique to one document type; and wherein the plurality of data detectors retrieved from the database is selected based on the document type (see paragraph 29 (determining which type of structured content records correspond to the unstructured content records using mapping data)).
Claim 14 is a method claim corresponding to claim 6 and is similarly rejected.

Regarding claim 7, Mitchell, in view of Meier, discloses the invention of claim 5 as discussed above. Mitchell further discloses wherein each identifier is at least one of a potential label and a potential format (see paragraph 42 (describing various fields of the structured record)).
Claim 15 is a method claim corresponding to claim 7 and is similarly rejected.

Regarding claim 8, Mitchell, in view of Meier, discloses the invention of claim 5 as discussed above. Mitchell further discloses wherein each validation criteria describes at least one of a valid format and a valid range (see paragraph 47).
Claim 16 is a method claim corresponding to claim 8 and is similarly rejected.

Claims 11 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Mitchell and Meier as applied to claims 5 and 13 above, and further in view of Chew (WO 2018/125264 A1; published Jul. 5, 2018).
Regarding claim 11, Mitchell, in view of Meier, discloses the invention of claim 5 as discussed above. Mitchell does not disclose wherein the processor is further configured to: train a machine learning model correlating text elements with the data detectors they have been associated with; determine whether the machine learning model performs better than one or more data detectors; and automatically employ the machine learning model in place of the one or more data detectors once the machine learning model outperforms the one or more data detectors. However, Chew teaches balancing a task between a machine learning model and a statistical model. Abstract. A server uses a machine learning model and a statistical model for a content selection task and assigns more users to the machine learning model when it outperforms the statistical model. Id. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mitchell to train a machine learning model to index the unstructured data and then assign users to the machine learning model when it outperforms the rule based model. Doing so would provide a baseline assurance of performance in combination with the benefits of a well-trained machine learning model. Chew, paragraph 5.
Claim 19 is a method claim corresponding to claim 11 and is similarly rejected.

Claims 12 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Mitchell and Meier as applied to claims 5 and 13 above, and further in view of Hilbert (US 2005/0076013 A1; published Apr. 7, 2005).
Regarding claim 12, Mitchell, in view of Meier, discloses the invention of claim 5 as discussed above. Mitchell teaches extracting information from a paper or electronic document such as an invoice and linking the information to structured content associated with a previously generated purchase order. See, e.g., paragraph 40. Yet, Mitchell does not particularly disclose wherein determining the source of the document comprises identifying all postal addresses in the document based upon an observed format, validating each postal address, placing each postal address in a standard format, and comparing each address with a list of addresses unique to each of a plurality of known document sources. However, Hilbert teaches extracting postal addresses from one or more pages of a document by parsing the text to find recognizable information. See paragraph 38. Specifically, Hilbert teaches parsing a series of searchable portions such as groups of characters and determining whether the group of characters is a postal code and if so, analyzing surrounding lines of text to obtain the remaining parts of an address. See paragraphs 40-41. Further, Hilbert teaches determining whether the preceding lines of text correspond to an address and comparing to a list of contacts in a contact information database to see if it matches. See paragraphs 42-51. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mitchell to identify and extract postal address information from the invoice documents, at least because doing so would enable retrieving contacts based on the context of the document. Hilbert, paragraph 9.
Claim 20 is a method claim corresponding to claim 12 and is similarly rejected.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHAHID K KHAN whose telephone number is (571)270-0419. The examiner can normally be reached M-F, 9-5 est.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Stephen Hong can be reached on (571)272-4124. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHAHID K KHAN/Examiner, Art Unit 2178