DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Summary
The following is a Final Office Action in response to the communication received on June 24, 2022.  
Claims 1, 9, and 16 have been amended.  
Claims 3, 10, 12-13, and 17-18 have been cancelled.  
Claims 1-2, 4-9, 11, 14-16, and 19-20 are pending.

Response to Amendment
Amendments to Claims 1, 9, and 16 are acknowledged.  Amendments to Claims 1, 9, and 16 are sufficient to overcome the 35 USC 101 rejection of Claims 1-2, 4-9, 11, 14-16, and 19-20.  Amended Claims 1, 9, and 16 provide specific limitations for using and training a machine learning model for extracting data from a plurality of documents or emails.  Specific techniques for training the machine learning model are not well-understood, routine, conventional computer functions.  


Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-9, 11, 14-16, and 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US Pat Pub No 2021/0133498 “Zhang”, in view of US Pat No 10,673,880 “Pratt”.

As per Claim 1, Zhang discloses a method for machine-assisted document input, comprising: 
receiving, at a data extraction application executed by a computer processor, a document or email, wherein the document or email comprises a billing statement (Zhang: [0030] data extraction system includes computing device comprising or in communication with document similarity engine, target document content extractor, consumer application(s) and database. [0033], document similarity engine may receive a target electronic document over the network, and may include a financial document or invoice; the examiner finds the document similarity engine of Zhang to satisfy the claimed “data extraction application”); 
generating, by the data extraction application ([0034] Document similarity engine 120 includes a text determiner 122.), a transcript of the document or email, wherein the transcript comprises a plurality of text groups from the document or email and a location for each text group in the document or email (Zhang: [0034], Text determiner may receive an electronic document, such as an image document (e.g., a PDF document), and then determines text contained within the electronic document, and [0051] the text modifier may determine a set of logic, rules, conditions, associations, or classification models (e.g., automatically, such as through machine learning, or manually through manual input) based on the characteristics of the identified portion of text to identify predefined portions of text in future documents. Example characteristics include a predefined term(s), a format of the text, symbols, numeric text, or context of the text associated with the portion of text identified by the user; [0035] In some aspects, text determiner 122 may determine text within the electronic document through an optical character recognition (“OCR”) engine. For example, in aspects where the electronic document is an image document that has not been pre-processed, the text determiner 122 processes the electronic document so as to determine text contained within the electronic document. Text determiner 122 may utilize an OCR engine to convert the image document into a document having machine-encoded text and determine the text contained within the electronic document. In some aspects, the text determiner 122 may disregard graphs, pictures, or other images in determining the text of the electronic document.); 
identifying, by the data extraction application, a vendor associated with the document or email based on contents of one of the text groups (Zhang: [0077], entity-document association engine (which is part of document similarity engine as shown in Fig. 1) stores an indication that a document representation is associated with a particular entity, such as a particular name and/or name of an organization); 
retrieving, by the data extraction application, a vendor-specific machine learning model trained for the vendor, (Zhang: [0077], entity-document association engine (which is part of document similarity engine as shown in Fig. 1) stores an indication that a document representation is associated with a particular entity, such as a particular name and/or name of an organization, [0078] each reference document representation within the set of reference document representations may be designated as being associated with a particular entity by a user or an administrator [0081], entity-document association engine utilizes a set of logic, rules, conditions, associations, or classification models, which may include one or more ML classification models, or other criteria, to determine an association between the electronic document and the entity); 
associating, by the data extraction application, each of the plurality of locations in the document or email with a billing field of the plurality of billing fields using the vendor-specific machine learning model (Zhang: [0081], entity-document association engine utilizes a set of logic, rules, conditions, associations, or classification models, which may include one or more ML classification models, or other criteria, to determine an association between the electronic document and the entity [0090], extraction model(s) may be a set of logic, rules, conditions, associations, or classification models, which may include one or more ML (Machine Learning) classification models, or other criteria, to identify where the data is located within the document  [0091], a first extraction model may identify a first set of locations (e.g., positions) for data while a second extraction model may identify a different set of locations (e.g., positions) for data. As a further example, in instances where the document representations are associated with a financial document (e.g., an invoice), an amount due may be located on different pages of an invoice, have different spacing between the amount due and the actual dollar amount (e.g., $304.56), or may be oriented above, below, or beside the text indicating the amount due); 
extracting, by the data extraction application, each of the text groups into one of the billing fields based on the association (Zhang: [0092], if target document data extractor receives a determination from document similarity determiner that a target document representation is similar to the first reference document representation, target document data extractor may utilize the extraction model associated with the first reference document representation to extract the content from the target electronic document. By way of example, target document data extractor may utilize the terms ‘Amount Due by 11/5/19’, surrounding words, spacing, and/or orientation that is determined based on the first reference electronic document (or representation thereof) to determine and/or extract the actual dollar amount (e.g., $304.56) from the text of the target electronic document); and 
transmitting, by the data extraction application, the billing fields with the extracted data to a user electronic device (Zhang: [0094] Target document data extractor 140 may output the extracted data to other components of the data extraction system 100.  Thus, transmitting the data. and [0096], teaches the user electronic device that the data is transmitted to such that, a consumer application(s) may include a graphical user interface that causes the extracted data to be presented on a display of a computing device).  

Zhang fails to disclose methods for machine-assisted document input, comprising: 
identifying, by the data extraction application, one of the locations of the one of one of the text groups; and
the vendor-specific machine learning model for the vendor being trained using a plurality of documents or emails for the vendor.

Pratt teaches methods for machine-assisted document input, comprising: 
identifying, by the data extraction application, one of the locations of one of the text groups (Pratt: Column 4, lines 47-61, machine data can have a predefined format, where data items with specific data formats are stored at predefined locations); and 
the vendor-specific machine learning model for the vendor being trained using a plurality of documents or emails for the vendor (Pratt: Column 39, lines 27-67, ML-based CEP engine is further capable of condensing and summarizing historical knowledge by observing streams of events through analysis of entity documents to train the machine learning models. This enables the ML-based CEP engine to include a form of historical comparison of entity (vendor) specific data as part of its analysis without consuming too much data storage capacity.).

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Zhang to include using text location through pattern matching as taught by Pratt, by enabling ML-based CEP engine to include a form of historical comparison as part of its analysis without consuming too much data storage capacity (Pratt: [0039], lines 16-21).

As per Claim 2, Zhang disclose a method, wherein the data extraction application identifies the vendor using a trained vendor identification machine learning model (Zhang: [0081]).  

As per Claims 3 and 10, Zhang discloses methods, wherein the vendor-specific machine learning model is trained using a plurality of documents or emails for the vendor (Zhang: [0079]- [0081]).  

As per Claims 4, 11, and 17, Zhang discloses methods, wherein the billing fields comprise a vendor name field, a vendor address billing field, an account number billing field, and an amount billing field (Zhang: [0052] and [0093]).  

As per Claim 5, Zhang discloses a method, further comprising: 
applying, by the data extraction application, a pattern matching algorithm to the text groups in the transcript to identify the billing fields (Zhang: [0093]).  

As per Claims 6, 12, 13, and 18, Zhang discloses methods, wherein the pattern matching algorithm to identify the billing fields based on a pattern of the text groups and the locations of the text groups in the document or email (Zhang: [0089]).  

Zhang fails to disclose methods, wherein the pattern matching algorithm uses regular expressions.

Pratt teaches methods, wherein the pattern matching algorithm uses regular expressions (Pratt: Column 28, lines 45-63).

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Zhang to include using text location through pattern matching using regular expressions as taught by Pratt, for identifying a vendor from document contents as taught by Zhang with the motivation of increasing efficiency by presenting information that is relevant while filtering out the information that is not relevant (Pratt: Column 7, lines 21-35).

As per Claims 7, 14, and 19, Zhang discloses methods, further comprising: 
classifying, by the data extraction application, contents of one of the text groups using a classification rule (Zhang: [0042]).  

As per Claims 8 and 20, Zhang discloses methods, wherein the document or email comprises an image (Zhang: [0035]).

As per Claim 9, Zhang discloses a method for machine-assisted document input, comprising: 
receiving, at a data extraction application executed by a computer processor, a document or email, wherein the document or email comprises a billing statement (Zhang: [0033], receive a target electronic document over the network, and may include a financial document or invoice); 
generating, by the data extraction application, a transcript of the document or email, wherein the transcript comprises a plurality of text groups from the document or email and a location for each text group in the document or email (Zhang: [0034], Text determiner may receive an electronic document, such as an image document (e.g., a PDF document), and then determines text contained within the electronic document, and [0051] the text modifier may determine a set of logic, rules, conditions, associations, or classification models (e.g., automatically, such as through machine learning, or manually through manual input) based on the characteristics of the identified portion of text to identify predefined portions of text in future documents. Example characteristics include a predefined term(s), a format of the text, symbols, numeric text, or context of the text associated with the portion of text identified by the user); 
retrieving, by the data extraction application, a vendor-agnostic machine learning model trained using a plurality of documents or emails from a plurality of vendors (Zhang: [0077], entity-document association engine stores an indication that a document representation is associated with a particular entity, such as a particular name and/or name of an organization [0079] entity-document association engine may store and indication that a first invoice is associated with UPS Air Freight Services and a second invoice in association with UPS Parcel Delivery Service [0081], entity-document association engine utilizes a set of logic, rules, conditions, associations, or classification models, which may include one or more ML classification models, or other criteria, to determine an association between the electronic document and the entity);
associating, by the data extraction application, each of the plurality of locations in the document or email with a billing field of the plurality of billing fields of the plurality of billing fields using the vendor-specific machine learning model (Zhang: [0090], extraction model(s) may be a set of logic, rules, conditions, associations, or classification models, which may include one or more ML (Machine Learning) classification models, or other criteria, to identify where the data is located within the document  [0091], a first extraction model may identify a first set of locations (e.g., positions) for data while a second extraction model may identify a different set of locations (e.g., positions) for data. As a further example, in instances where the document representations are associated with a financial document (e.g., an invoice), an amount due may be located on different pages of an invoice, have different spacing between the amount due and the actual dollar amount (e.g., $304.56), or may be oriented above, below, or beside the text indicating the amount due); 
extracting, by the data extraction application, each of the text groups into one of the billing fields based on the association (Zhang: [0092], if target document data extractor receives a determination from document similarity determiner that a target document representation is similar to the first reference document representation, target document data extractor may utilize the extraction model associated with the first reference document representation to extract the content from the target electronic document. By way of example, target document data extractor may utilize the terms ‘Amount Due by 11/5/19’, surrounding words, spacing, and/or orientation that is determined based on the first reference electronic document (or representation thereof) to determine and/or extract the actual dollar amount (e.g., $304.56) from the text of the target electronic document); and 
transmitting, by the data extraction application, the billing fields with the extracted data to a user electronic device (Zhang: [0096], a consumer application(s) may include a graphical user interface that causes the extracted data to be presented on a display of a computing device).  

Zhang fails to disclose methods for machine-assisted document input, comprising: 
applying to the text groups in the transcript, by the data extraction application, a pattern matching algorithm that uses regular expressions to identify a plurality of billing fields based on a pattern of the text groups and the locations of the text groups in the document or email.

Pratt teaches methods for machine-assisted document input, comprising: 
applying to the text groups in the transcript, by the data extraction application, a pattern matching algorithm that uses regular expressions to identify a plurality of billing fields based on a pattern of the text groups and the locations of the text groups in the document or email (Pratt: Column 5, lines 13-52, fields are defined by extraction rules (e.g., regular expressions) that derive one or more values from the portion of raw machine data in each event that has a particular field specified by an extraction rule such as billing fields. An extraction rule can generally include any type of instruction for extracting values from data in events. In some cases, an extraction rule comprises a regular expression where a sequence of characters form a search pattern, in which case the rule is referred to as a “regex rule.” The system applies the regex rule to the event data to extract values for associated fields in the event data by searching the event data for the sequence of characters defined in the regex rule.) .

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Zhang to include using text location through pattern matching as taught by Pratt, for identifying a vendor from document contents as taught by Zhang with the motivation of increasing efficiency by presenting information that is relevant while filtering out the information that is not relevant (Pratt: Column 7, lines 21-35).

As per Claim 16, Zhang discloses a method for machine-assisted document input, comprising: 
receiving, at a data extraction application executed by a computer processor, a document or email, wherein the document or email comprises a billing statement (Zhang: [0033], receive a target electronic document over the network, and may include a financial document or invoice); 
generating, by the data extraction application, a transcript of the document or email, wherein the transcript comprises a plurality of text groups from the document or email and a location for each text group in the document or email (Zhang: [0034], Text determiner may receive an electronic document, such as an image document (e.g., a PDF document), and then determines text contained within the electronic document, and [0051] the text modifier may determine a set of logic, rules, conditions, associations, or classification models (e.g., automatically, such as through machine learning, or manually through manual input) based on the characteristics of the identified portion of text to identify predefined portions of text in future documents. Example characteristics include a predefined term(s), a format of the text, symbols, numeric text, or context of the text associated with the portion of text identified by the user); 
wherein the billing fields comprise a vendor name field, a vendor address billing field, and account number billing field, and an amount billing field (Zhang: [0017] extraction model may be used to identify particular types of data within the document (e.g., an amount due, a due date, a customer or company name, or contact information));
extracting, by the data extraction application, each of the text groups into one of the billing fields based on the association (Zhang: [0092], if target document data extractor receives a determination from document similarity determiner that a target document representation is similar to the first reference document representation, target document data extractor may utilize the extraction model associated with the first reference document representation to extract the content from the target electronic document. By way of example, target document data extractor may utilize the terms ‘Amount Due by 11/5/19’, surrounding words, spacing, and/or orientation that is determined based on the first reference electronic document (or representation thereof) to determine and/or extract the actual dollar amount (e.g., $304.56) from the text of the target electronic document); and 
transmitting, by the data extraction application, the billing fields with the extracted data to a user electronic device (Zhang: [0096], a consumer application(s) may include a graphical user interface that causes the extracted data to be presented on a display of a computing device).  

Zhang fails to disclose methods for machine-assisted document input, comprising: 
applying to the text groups in the transcript, by the data extraction application, a pattern matching algorithm that uses regular expressions to identify billing fields based on a pattern of the text groups and locations in the document or email.

Pratt teaches methods for machine-assisted document input, comprising: 
applying to the text groups in the transcript, by the data extraction application, a pattern matching algorithm that uses regular expressions to identify billing fields based on a pattern of the text groups and locations in the document or email(Pratt: Column 5, lines 13-52, fields are defined by extraction rules (e.g., regular expressions) that derive one or more values from the portion of raw machine data in each event that has a particular field specified by an extraction rule such as billing fields. An extraction rule can generally include any type of instruction for extracting values from data in events. In some cases, an extraction rule comprises a regular expression where a sequence of characters form a search pattern, in which case the rule is referred to as a “regex rule.” The system applies the regex rule to the event data to extract values for associated fields in the event data by searching the event data for the sequence of characters defined in the regex rule.).

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Zhang to include using text location through pattern matching as taught by Lau, for identifying a vendor from document contents as taught by Zhang with the motivation of increasing efficiency by presenting information that is relevant while filtering out the information that is not relevant (Pratt: Column 7, lines 21-35).

Response to Arguments

Applicant’s arguments, see Applicant Arguments/Remarks Made in an Amendment, filed June 24, 2022, with respect to the rejection(s) of claim(s) 1-2, 4-9, 11, 14-16, and 19-20 under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of US Pat Pub No 2021/0133498 “Zhang”, in view of US Pat No 10,673,880 “Pratt”.



Conclusion

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to REVA R MOORE whose telephone number is (571)270-7942. The examiner can normally be reached M-Th: 9:00-6:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fahd Obeid can be reached on 571-270-3324. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/REVA R MOORE/Examiner, Art Unit 3687                                                                                                                                                                                                        
/PETER LUDWIG/Primary Examiner, Art Unit 3687