DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This is a Non-Final Office Action in response to application 16/577,821 entitled "ANOMALY AND FRAUD DETECTION USING DUPLICATE EVENT DETECTOR" filed on September 20, 2019 with claims 1-4 and 8-20 pending.
Status of Claims
Claims 1, 18, and 20 have been amended and are hereby entered.
Claims 5-7 were previously cancelled. Claims 21 is newly cancelled.
Claims 1-4 and 8-20 are pending and have been examined.

Response to Amendment
The amendment filed April 28, 2022 has been entered. Claims 1-4 and 8-20 remain pending in the application.  Applicant’s amendments to the Specification, Drawings, and/or Claims have been noted in response to the Final Office Action mailed March 2, 2021.

 Information Disclosure Statement
The information disclosure statement (IDS) submitted on October 1, 2019, December 21, 2020, May 13, 2021, January 25, 2022, and June 3, 2022 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-4 and 8-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Claims 1-4 and 8-20 are directed to a system, method, or product program, which are/is one of the statutory categories of invention. (Step 1: YES).
The claimed invention is directed to an abstract idea without significantly more. 
Independent Claim 1 recites: 
“receiving… the image of the first receipt associated with an expense …”
“determining that the user is associated with the first entity”
 “generating a compound key…”
“determining whether the compound key matches an existing compound key….”
“identifying the first receipt as a non-duplicate receipt;”
“identifying the first receipt as a duplicate receipt;”
“automatically generating a duplicate receipt event…”
“receiving an image of a second receipt…”
 “determining whether the second receipt is a non-duplicate receipt or a duplicate receipt…”
 “identifying the initial classification as a false positive classification”
“reclassifying the second receipt…”
These limitations clearly relate to managing transactions/interactions between a customer and vendor/merchant.  These limitations, under their broadest reasonable interpretation, cover performance of the limitation as certain methods of organizing human activity. For example, instructing to receive an image of a receipt associated with an expense on the expense report or identify a non-duplicate receipt or generate a duplicate receipt event recites a commercial or legal action, principle, or practice and managing interactions between people. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation as a commercial or legal action, principle, or practice then it falls within the “Certain Methods of Organizing Human Activity” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. (Step 2A-Prong 1: YES. The claims recite an abstract idea).
This judicial exception is not integrated into a practical application. In particular, the claims recite the additional elements of:
“training at least one machine learning extraction model”, “learning, by the trained at least one machine learning extraction model…”, “modifying the trained at least one machine learning extraction model”, “updating the first instance of the trained at least one machine learning extraction model”:
merely applying machine learning technology to the abstract idea
 “extracting tokens from the receipt using machine learning extraction models”
insignificant extra-solution activity to the judicial exception of data gathering
mere instructions to implement an abstract idea on a computer or merely uses machine learning as a tool to perform an abstract idea
“compound key”; “generating an updated compound key…”; “generating a compound key for the second receipt …”; “using the updated compound key to classify future receipts…”
generally linking to the particular technology of data and database management
 “database”
generally linking to the particular technology of database and data storage

are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer components and/or electronic processes.  This is generally linking to the particular technology of database and data integrity. For Example, the Applicant’s Specification reads, “[0046] Indeed, the server 102 and the client devices 104 and 105 may be any computer or processing device such as, for example, a blade server, general-purpose personal computer (PC), Mac.RTM., workstation, UNIX-based workstation, or any other suitable device....Further, the server 102 and the client devices 104 and 105 may be adapted to execute any operating system, including Linux, UNIX, Windows….[0027] various machine learning approaches can be employed to replace and/or augment human auditors.”. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.   The additional elements merely add instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP 2106.05(f). Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and are at a high level of generality. Therefore, Claim 1 is directed to an abstract idea without a practical application.  (Step 2A-Prong 2: NO. The additional claimed elements are not integrated into a practical application)
Dependent Claims 2 and 4-10 recite additional elements.
This judicial exception is not integrated into a practical application. In particular, the recited additional elements of 
Claim 2: 
“compound key”
generally linking to the particular technology of data and database management
“tokens” 
generally linking to the particular technology of tokenization
“one-way non-reversible hash value”
 generally linking to the particular technology of data hashing
Claim 3: 
“compound key”
generally linking to the particular technology of data and database management
Claim 4: (none found: does not include additional elements and merely narrows the abstract idea.)
Claim 8: (none found: does not include additional elements and merely narrows the abstract idea.)
Claim 9: (none found: does not include additional elements and merely narrows the abstract idea.)
Claim 10: “extracted tokens”
generally linking to the particular technology of automation and tokenization
Claim 11: (none found: does not include additional elements and merely narrows the abstract idea.)
Claim 12: (none found: does not include additional elements and merely narrows the abstract idea)
Claim 13: “machine learning engine”
merely applying the technology of machine learning to the abstract idea
Claim 14: (none found: does not include additional elements and merely narrows the abstract idea)
Claim 15: (none found: does not include additional elements and merely narrows the abstract idea)
Claim 16: (none found: does not include additional elements and merely narrows the abstract idea)
Claim 17: (none found: does not include additional elements and merely narrows the abstract idea)
 are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer components and/or electronic processes.  With regard to the additional elements of “receiving an image…” and “extracting tokens…” these elements are insufficient to integrate the abstract idea into a practical application as they are merely insignificant extra-solution activity, specifically data gathering. Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and are at a high level of generality. Therefore, these dependent claims are directed to an abstract idea without a practical application.  (Step 2A-Prong 2: NO. The additional claimed elements are not integrated into a practical application)
Independent Claim 18 recites: 
 “receiving… the image of the first receipt associated with an expense …”
“determining that the user is associated with the first entity”
 “generating a compound key…”
“determining whether the compound key matches an existing compound key….”
“identifying the first receipt as a non-duplicate receipt;”
“identifying the first receipt as a duplicate receipt;”
“automatically generating a duplicate receipt event…”
“receiving an image of a second receipt…”
 “determining whether the second receipt is a non-duplicate receipt or a duplicate receipt…”
 “identifying the initial classification as a false positive classification”
“reclassifying the second receipt…”
These limitations clearly relate to managing transactions/interactions between a customer and vendor/merchant.  These limitations, under their broadest reasonable interpretation, cover performance of the limitation as certain methods of organizing human activity. For example, instructing to receive an image of a receipt associated with an expense on the expense report or identify a non-duplicate receipt or generate a duplicate receipt event recites a commercial or legal action, principle, or practice and managing interactions between people. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation as a commercial or legal action, principle, or practice then it falls within the “Certain Methods of Organizing Human Activity” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. (Step 2A-Prong 1: YES. The claims recite an abstract idea).
This judicial exception is not integrated into a practical application. In particular, the claims recite the additional elements of: 
“computers”, “a computer-readable medium coupled to the one or more computers having instructions stored thereon which, when executed by the one or more computers, cause the one or more computers to perform operations” 
merely applying computer technology to the abstract idea
“receiving image of a receipt associated with an expense on an expense report”
insignificant extra-solution activity to the judicial exception of data gathering
“extracting tokens from the receipt using machine learning extraction models”
insignificant extra-solution activity to the judicial exception of data gathering
merely applying the technology of machine learning to the abstract idea
“compound key”; “generating an updated compound key…”; “generating a compound key for the second receipt …”; “using the updated compound key to classify future receipts…”
generally linking to the particular technology of data and database management
  “database” 
merely applying database technology to the abstract idea
“training at least one machine learning extraction model”, “learning, by the trained at least one machine learning extraction model…”, “modifying the trained at least one machine learning extraction model”:
merely applying machine learning technology to the abstract idea

are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer components and/or electronic processes.  This is generally linking to the particular technology of database and data integrity. For Example, the Applicant’s Specification reads, “[0046] Indeed, the server 102 and the client devices 104 and 105 may be any computer or processing device such as, for example, a blade server, general-purpose personal computer (PC), Mac.RTM., workstation, UNIX-based workstation, or any other suitable device....Further, the server 102 and the client devices 104 and 105 may be adapted to execute any operating system, including Linux, UNIX, Windows….[0027] various machine learning approaches can be employed to replace and/or augment human auditors”. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.   The additional elements merely add instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP 2106.05(f). Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and are at a high level of generality. Therefore, Claim 18 is directed to an abstract idea without a practical application.  (Step 2A-Prong 2: NO. The additional claimed elements are not integrated into a practical application)
Dependent Claim 19 recites no additional elements, it further narrows the abstract idea.
Independent Claim 20 recites: 
 “receiving… the image of the first receipt associated with an expense …”
 “generating a compound key…”
“determining whether the compound key matches an existing compound key….”
“identifying the first receipt as a non-duplicate receipt;”
“identifying the first receipt as a duplicate receipt;”
“automatically generating a duplicate receipt event…”
“receiving an image of a second receipt…”
 “determining whether the second receipt is a non-duplicate receipt or a duplicate receipt…”
“generating a compound key for the second receipt …”
“reclassifying the second receipt…”
“using the updated compound key to classify future receipts…”
These limitations clearly relate to managing transactions/interactions between a customer and vendor/merchant.  These limitations, under their broadest reasonable interpretation, cover performance of the limitation as certain methods of organizing human activity. For example, instructing to receive an image of a receipt associated with an expense on the expense report or identify a non-duplicate receipt or generate a duplicate receipt event recites a commercial or legal action, principle, or practice and managing interactions between people. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation as a commercial or legal action, principle, or practice then it falls within the “Certain Methods of Organizing Human Activity” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. (Step 2A-Prong 1: YES. The claims recite an abstract idea).
This judicial exception is not integrated into a practical application. In particular, the claims recite the additional elements of: 
“computer program product”, “a non-transitory storage medium, the product comprising non-transitory, computer readable instructions for causing one or more processors to perform operations”, 
merely applying computer technology to the abstract idea
“receiving image of a receipt associated with an expense on an expense report”,
insignificant extra-solution activity to the judicial exception of data gathering
 “extracting tokens from the receipt using machine learning extraction models”   
generally linking to the particular technology of automation and tokenization
“training at least one machine learning extraction model”, “learning, by the trained at least one machine learning extraction model…”, “modifying the trained at least one machine learning extraction model”:
merely applying machine learning technology to the abstract idea

are recited at a high-level of generality (i.e., as a generic processor performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer components and/or electronic processes.  This is generally linking to the particular technology of database and data integrity. For Example, the Applicant’s Specification reads, “[0046] Indeed, the server 102 and the client devices 104 and 105 may be any computer or processing device such as, for example, a blade server, general-purpose personal computer (PC), Mac.RTM., workstation, UNIX-based workstation, or any other suitable device....Further, the server 102 and the client devices 104 and 105 may be adapted to execute any operating system, including Linux, UNIX, Windows….[0027] various machine learning approaches can be employed to replace and/or augment human auditors”. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.   The additional elements merely add instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP 2106.05(f).  Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and are at a high level of generality. Therefore, Claim 20 is directed to an abstract idea without a practical application.  (Step 2A-Prong 2: NO. The additional claimed elements are not integrated into a practical application)
  Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.  Accordingly, these additional elements, do not change the outcome of the analysis, when considered separately and as an ordered combination. Dependent claims further define the abstract idea that is present in their respective independent claims   and thus correspond to Certain Methods of Organizing Human Activity and hence are abstract for the reasons presented above.  The dependent claims do not include any additional elements that integrate the abstract idea into a practical application or are sufficient to amount to significantly more than the judicial exception when considered both individually and as an ordered combination.  Therefore, the dependent claims are directed to an abstract idea.  Thus, Claims 1-4 and 8-20 are not patent eligible. (Step 2B: NO. The claims do not provide significantly more) 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4 and 8-21 are rejected under 35 U.S.C. 103 as being unpatentable over Bhatnagar (“DATA EXTRACTION AND DUPLICATE DETECTION”, U.S. Publication Number: 2020/0104587 A1), in view of Dirac (“EFFICIENT DUPLICATE DETECTION FOR MACHINE LEARNING DATA SETS”, U.S. Publication Number: 2015/0379430 A1),in view of Oxford (“METHOD AND SYSTEM FOR PROCESS WORKING SET ISOLATION”, U.S. Publication Number: 2013/0254494 A1),in view of Mehta (“METHODS AND SYSTEMS FOR UPDATING A DATABASE BASED ON OBJECT RECOGNITION”, U.S. Publication Number: 2020/0218890  A1),







Regarding Claim 1, 
Bhatnagar teaches,
prior to receiving an image of a first receipt, training at least one machine learning extraction model using historical receipt images, historical receipt text, and historical data values; 
(Bhatnagar [0014] reading invoices (both pdfs and images), extracting key relevant information from the face of invoices, organizing the relevant information in a structured template as a key-value pair
Bhatnagar [0034] associated labels may be used as inputs to train a classifying engine.
Bhatnagar [0017]  using machine learning
Bhatnagar [0049] operations may be machine operations or any of the operations may be conducted or enhanced by artificial intelligence (AI) or machine learning. 
Bhatnagar [0015]  A duplicate detection module may compare the structured text with invoices stored in a historic invoice repository.
Bhatnagar [0004] that the invoice is a duplicate of a historic invoice in a historic invoice database;
Bhatnagar [0040]  the steps recited in any of the method or process descriptions may be executed in any order and are not limited to the order presented. Moreover, any of the functions or steps may be outsourced to or performed by one or more third parties....steps may be performed in any suitable order.)
receiving, while a user is creating an expense report, the image of the first receipt associated with an expense on the expense report;
(Bhatnagar [0004] including receiving an invoice; performing optical character recognition on the invoice
Bhatnagar [0014]  invoice processing which includes reading invoices (both pdfs and images)
Bhatnagar [0016] expense reporting)
extracting first tokens from the first receipt using the first instance of the trained at least one machine learning extraction model that is configured with the first set of parameters;
(Bhatnagar [0004] extracting a plurality of key-value pairs from the invoice;…determining, using a duplicate model and based on the feature vector, that the invoice is a duplicate of a historic invoice in a historic invoice database;
Bhatnagar [0004]  receiving an input to the duplicate model 
Bhatnagar [0034]  vectors and associated labels may be used as inputs to train
Bhatnagar [0049] Rather, the operations may be machine operations or any of the operations may be conducted or enhanced by artificial intelligence (AI) or machine learning.
Examiner notes the key-value pair amounts to a "token". 
determining whether the key matches an existing compound key in a database of historical receipts;
(Bhatnagar [0004]     extracting a plurality of key-value pairs from the invoice; comprising the plurality of key-value pairs; forming a feature vector;  
Bhatnagar [0014]  organizing the relevant information in a structured template as a key-value pair
Bhatnagar [0005]   forming the feature vector may comprise concatenating similarity measures across different fields.
Bhatnagar [0003] processing invoices from multiple sources and formats to organize the key information in a structured manner
Bhatnagar [0033]  The duplicate detection module may compare corresponding fields across two invoices and compute different similarity measures.)
wherein the database of historical receipts includes receipts from users of the first entity and from users of the second entity;
(Bhatnagar [0004] a historic invoice in a historic invoice database
Bhatnagar [0019]  a historic invoice repository 130, and one or more invoice sources, such as a supplier server 141, a supplier client computer)
in response to determining that the compound key does not match an existing key:
(Bhatnagar [0004]     extracting a plurality of key-value pairs from the invoice; comprising the plurality of key-value pairs; forming a feature vector;  
Bhatnagar [0005]   forming the feature vector may comprise concatenating similarity measures across different fields.
 Bhatnagar [0014]  organizing the relevant information in a structured template as a key-value pair
Bhatnagar [0003] processing invoices from multiple sources and formats to organize the key information in a structured manner
Bhatnagar [0033]  The duplicate detection module may compare corresponding fields across two invoices and compute different similarity measures. 
Bhatnagar [0020]  Unique invoices may be processed)
identifying the first receipt as a non-duplicate receipt; and
(Bhatnagar [0020] Unique invoices may be processed
Bhatnagar [0049] No such capability of a human operator is necessary, or desirable in most cases, in any of the operations described herein. Rather, the operations may be machine operations or any of the operations may be conducted or enhanced by artificial intelligence (AI) or machine learning.)
in response to determining that the compound key matches an existing key:
(Bhatnagar [0004]     extracting a plurality of key-value pairs from the invoice; comprising the plurality of key-value pairs; forming a feature vector;  
Bhatnagar [0005]   forming the feature vector may comprise concatenating similarity measures across different fields.
 Bhatnagar [0014]  organizing the relevant information in a structured template as a key-value pair
Bhatnagar [0003] processing invoices from multiple sources and formats to organize the key information in a structured manner
Bhatnagar [0033]  The duplicate detection module may compare corresponding fields across two invoices and compute different similarity measures. 
Bhatnagar [0020]  The merchant system 110 may compare the key data fields and identify duplicate invoices. Duplicate invoices may be rejected and returned to the supplier. Unique invoices may be processed)
automatically identifying the first receipt as a duplicate receipt; and
(Bhatnagar [0004] data extraction and invoice duplicate detection are disclosed
Bhatnagar  [0014]  identify potential duplicate invoices.)
automatically identifying the first receipt as a duplicate receipt; and automatically generating a duplicate receipt event, including providing a duplicate receipt notification to the user before the user submits the expense report.
(Bhatnagar [0020]  Duplicate invoices may be rejected and returned to the supplier.
Bhatnagar [0049] No such capability of a human operator is necessary, or desirable in most cases, in any of the operations described herein. Rather, the operations may be machine operations or any of the operations may be conducted or enhanced by artificial intelligence (AI) or machine learning.
Bhatnagar [0020]  Duplicate invoices may be rejected and returned to the supplier. 
Bhatnagar  [0031] generating a potential duplicate review report
Bhatnagar [0014]  invoice processing which includes reading invoices (both pdfs and images)
Bhatnagar [0016] The system may enable detection of duplicate invoices.   This may also result in easier ... expense reporting.... and/or other electronic data may be available in real time or soon after an invoice is received.
Bhatnagar [0040] the steps recited in any of the method or process descriptions may be executed in any order and are not limited to the order presented...steps may be performed in any suitable order. 
Bhatnagar  [0016] This may also result in easier accounting, book keeping, expense reporting)
 updating the first instance of the trained at least one machine learning extraction model based on the first receipt;
(Bhatnagar [0014] reading invoices (both pdfs and images), 
Bhatnagar [0033]  The duplicate detection module may compare corresponding fields across two invoices and compute different similarity measures....The duplicate detection module may repeat this process to generate feature vectors by pairing each test invoice with invoices form the repository.
Bhatnagar [0035]  The duplicate detection module may refine the duplicate model's prediction if it conflicts with any of the domain rules.
Bhatnagar [0036]  the duplicate model to update and re-train the duplicate model 
Bhatnagar [0017]  using machine learning 
Bhatnagar [0049] operations may be machine operations or any of the operations may be conducted or enhanced by artificial intelligence (AI) or machine learning. 
Bhatnagar [0015]  A duplicate detection module may compare the structured text with invoices stored in a historic invoice repository.)
receiving an image of a second receipt associated with a second expense on the expense report;
(Bhatnagar [0021] receive a plurality of invoices, which may be from different vendors and in different
Bhatnagar [0014] reading invoices (both pdfs and images), extracting key relevant information from the face of invoices 
Bhatnagar [0016] The system may enable detection of duplicate invoices.   This may also result in easier ... expense reporting)
extracting second tokens from the second receipt using the updated first instance of the trained at least one machine learning extraction model; 
(Bhatnagar [0021] receive a plurality of invoices, which may be from different vendors and in different formats
Bhatnagar [0014] reading invoices (both pdfs and images), extracting key relevant information from the face of invoices 
Bhatnagar [0016] The system may enable detection of duplicate invoices.   This may also result in easier ... expense reporting
Bhatnagar [0004]  extracting a plurality of key-value pairs from the invoice;
Bhatnagar [0049] Rather, the operations may be machine operations or any of the operations may be conducted or enhanced by artificial intelligence (AI) or machine learning.)
and determining whether the second receipt is a non-duplicate receipt or a duplicate receipt based on the second tokens.
(Bhatnagar [0004]  extracting a plurality of key-value pairs from the invoice;
Bhatnagar [0020] The merchant system 110 may compare the key data fields and identify duplicate invoices. Duplicate invoices may be rejected and returned to the supplier. Unique invoices may be processed and stored
Bhatnagar [Claim 6] further comprising saving, by the computer-based system, a value and location of a field in the invoice.
Bhatnagar [0037] The invoice may then be stored in the historic invoice database.)
generating a compound key for the second receipt based on the second tokens and classifying the second receipt, in an initial classification, as a duplicate receipt based on the compound key for the second receipt matching an existing compound key of a third receipt;
(Bhatnagar [0004]     extracting a plurality of key-value pairs from the invoice; comprising the plurality of key-value pairs; forming a feature vector;  
Bhatnagar [0005]   forming the feature vector may comprise concatenating similarity measures across different fields.
 Bhatnagar [0014]  organizing the relevant information in a structured template as a key-value pair

Bhatnagar [Abstract] comparing invoices 
Bhatnagar [0015]  A duplicate detection module may compare the structured text with invoices stored in a historic invoice repository
Bhatnagar [0029]    to be classified 
Bhatnagar  [0020]  system 110 may compare the key data fields and identify duplicate invoices. 
Bhatnagar  [0042]  “match,”...may include an identical match, a partial match, meeting certain criteria, matching a subset of data, a correlation, satisfying certain criteria, a correspondence, an association, an algorithmic relationship, and/or the like.)
learning, by the first instance of the   trained at least one machine learning extraction model, that at least one field from the second receipt not in the existing compound key is different between the second receipt and the third receipt, and that the differences between the at least one field indicate that the second receipt is not a duplicate of the third receipt;
(Bhatnagar  [0049]  operations may be conducted or enhanced by artificial intelligence (AI) or machine learning. 
Bhatnagar [0033] duplicate detection module may compare corresponding fields across two invoices and compute different similarity measures.
Bhatnagar  [0034]  the duplicate detection module may assign a label {0, 1} to the feature vector corresponding to each pair. The value may be a 1 if the pair is a duplicate pair, or a 0 if the pair is not a duplicate.)
reclassifying the second receipt, in a second classification, as a non-duplicate receipt, to correct a false positive for the second receipt;
(Bhatnagar [0027]  data extraction module may create regular expressions for all possible variations of different fields...check if the extracted values are valid and satisfies the field criteria. If not, the data extraction module may reassign the field value)
of the second receipt and the second classification of the second receipt
(Bhatnagar [Abstract] comparing invoices 
Bhatnagar [0015]  A duplicate detection module may compare the structured text with invoices stored in a historic invoice repository
Bhatnagar [0029]    to be classified)
generating an updated compound key by updating a structure of the compound key used by the first instance of the at least one machine learning extraction model to include both existing fields of the compound key and new compound key fields that include the at least one field; 
(Bhatnagar [0004] extracting a plurality of key-value pairs from the invoice; generating a structured template comprising the plurality of key-value pairs; forming a feature vector; ...and modifying, based on the input, the duplicate model.
Bhatnagar [0005]   forming the feature vector may comprise concatenating similarity measures across different fields.)
modifying the trained first instance of the at least one machine learning extraction model to process future receipts by extracting the at least one field and using the updated compound key that includes both the existing compound key and the new compound key fields; 
(Bhatnagar [0036]  re-train the duplicate model with the corrected labels for the pairs.
Bhatnagar [0033]  duplicate detection module may repeat this process to generate feature vectors by pairing each test invoice with invoices form the repository.
Bhatnagar [0004]     extracting a plurality of key-value pairs from the invoice; comprising the plurality of key-value pairs; forming a feature vector;  
Bhatnagar [0005]   forming the feature vector may comprise concatenating similarity measures across different fields.
 Bhatnagar [0014]  organizing the relevant information in a structured template as a key-value pair
Bhatnagar [Claim 1] extracting, by the computer-based system, a plurality of key-value pairs from the invoice )
using, by the first instance of the at least one machine learning extraction model, the updated compound key to classify a fourth receipt..
(Bhatnagar [0033]  duplicate detection module may repeat this process to generate feature vectors by pairing each test invoice with invoices form the repository.
Bhatnagar [0033]  duplicate detection module may repeat this process to generate feature vectors by pairing each test invoice with invoices form the repository.)
Bhatnagar does not teach  generating a compound key using a subset of the first tokens, wherein the subset includes a transaction time; while a user is creating an expense report;  and while the user continues to create the expense report;         wherein the training includes configuring a first instance of the at least one machine learning model for a first entity using a first set of parameters and configuring a second instance of the at least one machine learning model for a second entity using a second set of parameters, wherein the first entity and the second entity are different entities and wherein the first set of parameters is different than the second set of parameters; associated with the first entity; determining that the user is associated with the first entity; first instance; identifying the initial classification as a false positive classification; updating the first instance of the trained at least one machine learning extraction model based on the false positive classification …, wherein updating the first instance of the trained at least one machine learning extraction model includes;
Dirac teaches,
wherein the training includes configuring a first instance of the at least one machine learning model for a first entity using a first set of parameters and configuring a second instance of the at least one machine learning model for a second entity using a second set of parameters, 
(Dirac [0199] At time t1, a training job J1 of a training-and-evaluation iteration TEI1 for a model M1 is begun.....At time t2, a training job J2 may be scheduled ....for a training-and-evaluation iteration TEI2 for a different model M2.
Dirac [0090]  The model developers may continue to experiment with various algorithms, parameters and/or input data sets to obtain improved versions of the underlying model
Dirac [0086] a number of different types of entities related to machine learning tasks may be generated, modified, read, executed, and/or queried/searched via MLS programmatic interfaces. Supported entity types in one embodiment may include...data sources)
wherein the first entity and the second entity are different entities 
(Dirac [0086] , a number of different types of entities related to machine learning tasks may be generated, modified, read, executed, and/or queried/searched via MLS programmatic interfaces.
Dirac [0087] tasks (and the corresponding APIs) may involve multiple different entity types)
and wherein the first set of parameters is different than the second set of parameters; 
(Dirac [0108] result in the generation of a model training plan 428 (which may in turn involve several iterations of training, e.g., with different sets of parameters).
Dirac [0156]  Based on the result sets generated for the different parameter values 
Dirac [0354] Different parameter values may be selected)
associated with the first entity; 
(Dirac [0086] , a number of different types of entities related to machine learning tasks)
determining that the user is associated with the first entity; 
(Dirac [0371] corresponding to numerous types of entities .... may enable multiple users or collaborators to share and re-use feature-processing recipes 
Dirac [0085]  user-defined functions...may be restricted to security containers defined by the provider network... for example the client's machine learning tasks are executed in an isolated, single-tenant fashion ... The term “MLS control plane” may be used herein to refer to a collection of hardware and/or software entities that are responsible for implementing various types of machine learning functionality on behalf of clients of the MLS)
first instance; 
(Dirac  [0087] a programmatic interface (such as an API) may correspond to a request for one or more operations or tasks on one or more instances of a supported type of entity.)
identifying the initial classification as a false positive classification; 
(Dirac [0324]  an incorrect classification called a “false positive” results.)
updating the first instance of the trained at least one machine learning extraction model based on the false positive classification …, wherein updating the first instance of the trained at least one machine learning extraction model includes;
(Dirac [0090] experiment with various algorithms, parameters and/or input data sets to obtain improved versions of the underlying model.... changes to the underlying models
Dirac [0330]    the client may decide that the interpretation threshold(s) for the model should be changed in a direction such that, in general, fewer false positive decisions would be likely to occur. 
Dirac [0004] used for training the models)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data extraction and duplicate detection teachings of Bhatnagar to incorporate the efficient duplicate detection for machine learning data sets teachings of Dirac that examine “for the presence of duplicates of observation records of the first set in accordance with a probabilistic duplicate detection technique.” (Dirac [Claim 1] ).        The modification would have been obvious, because it is merely applying a known technique (i.e. efficient duplicate detection for machine learning data sets) to a known concept (i.e. data security) ready for improvement to yield predictable result (i.e. “notifying a client of a detection of potential duplicate observation records, … providing an indication of a particular observation record of the second set which has been identified as having a non-zero probability of being a duplicate” Dirac [Claim 10])
Dirac does not teach generating a compound key using a subset of the first tokens, wherein the subset includes a transaction time; while a user is creating an expense report;  and while the user continues to create the expense report;         
Oxford teaches,
  generating a compound key using a subset of the first tokens, 
(Oxford [0050] Each key must be combined with at least one other key to construct a compound key.)
wherein the subset includes a transaction time
(Oxford [0105] In such a case, the consistent use of the timestamp value as a component of a compound key)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data extraction and duplicate detection teachings of Bhatnagar to incorporate the data security teachings of Oxford   that “relates to securing data (including instructions) associated with processes of a computing system.” (Oxford [0002] ).        The modification would have been obvious, because it is merely applying a known technique (i.e. data security) to a known concept (i.e. data security) ready for improvement to yield predictable result (i.e. “data of such security systems may likewise be secured, where by securing such data the effectiveness of such a security system may be enhanced” Oxford [0006])
Oxford does not teach while a user is creating an expense report;  and while the user continues to create the expense report.
Mehta teaches,
  while a user is creating an expense report;
(Mehta [0061] A user may also manually input the name of the location the picture was taken and then image capture device 202 can tag the picture.
Mehta [0143]  including information in the associated image such as expense reports
Examiner interprets this as a user manually addressing image related tasks as part of the process of generating an expense report. )
  and while the user continues to create the expense report.
(Mehta [0061] A user may also manually input the name of the location the picture was taken and then image capture device 202 can tag the picture.
Mehta [0143]  including information in the associated image such as expense reports
Examiner interprets this as a user manually addressing image related tasks as part of the process of generating an expense report. )
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data extraction and duplicate detection teachings of Bhatnagar to incorporate the database update teachings of Mehta for “using object recognition techniques to update a database” (Mehta [0001] ).        The modification would have been obvious, because it is merely applying a known technique (i.e. database update procedures) to a known concept (i.e. data security) ready for improvement to yield predictable result (i.e. “Such associations of individuals, objects, and/or data values or portions of data values may be recorded in a database of the system for further tracking and analysis, as well as other uses by the enterprise (including marketing, accounts payable, or other such uses). Users need not manually identify the individuals or objects or associations with the data values, reducing end user effort and reducing non-compliance. Rules may be applied to each association of object, individual, and/or data value, as well as across multiple associations, to automatically detect erroneous or fraudulent entries.” Mehta [0003])

Regarding Claim 2, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 1 as described earlier.
Bhatnagar does not teach wherein generating the compound key comprises generating a one-way non-reversible hash value using the subset of the first tokens.
Oxford teaches,
wherein generating the compound key comprises generating a one-way non-reversible hash value using the subset of the first tokens.
Oxford teaches,
wherein generating the compound key comprises generating a one-way non-reversible hash value using the subset of tokens.
(Oxford [0052] There are multiple methods to create a compound key, but two such mechanisms are given by way of example : one-way and reversible....In the one-way method, a compound key may be generated by a secure one-way hash function. In this example, a compound key can be generated by concatenating at least two precursor keys and passing the resultant concatenated data set through a one-way hash function. The one-way property of the hash function makes this kind of transformation irreversible.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data extraction and duplicate detection teachings of Bhatnagar to incorporate the data security teachings of Oxford   that “relates to securing data (including instructions) associated with processes of a computing system.” (Oxford [0002] ).        The modification would have been obvious, because it is merely applying a known technique (i.e. data security) to a known concept (i.e. data security) ready for improvement to yield predictable result (i.e. “data of such security systems may likewise be secured, where by securing such data the effectiveness of such a security system may be enhanced” Oxford [0006])
Regarding Claim 3, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 1 as described earlier.
Bhatnagar teaches,
 wherein the compound key 
(Bhatnagar [0004]  extracting a plurality of key-value pairs from the invoice;
Bhatnagar[0014]  organizing the relevant information in a structured template as a key-value pair)
includes an amount, a vendor name, 
(Bhatnagar [0029] For a line to be classified as a valid table header, at least two phrases may have high overlap with description fields, e.g. “Date,” “Amount,” “Invoice Number,” “Description,” etc.)
and a vendor location.
(Bhatnagar [0030] The data extraction module may assimilate phrases from the next few lines which could potentially be a part of the corresponding address based on their spatial locations. 
Bhatnagar [0020]   returned to the supplier.)
Regarding Claim 4, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 1 as described earlier.
Bhatnagar teaches,
further comprising performing one or more actions in response to the duplicate receipt event.
(Bhatnagar [0020]  Duplicate invoices may be rejected and returned to the supplier.)
Regarding Claim 8, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 4 as described earlier.
Bhatnagar teaches,
wherein the one or more actions comprise sending a duplicate receipt notification
(Bhatnagar  [0031] generating a potential duplicate review report)
Bhatnagar does not teach to a manager of the user.
Mehta teaches,
to a manager of the user.
(Mehta [0003] values to be associated with objects or individuals (including potentially one or more additional users or individuals) identified in the photos (either via optical character recognition from one or more photos or via data values entered manually or received from another device). 
Mehta [0102]  may include information about different employees in a company. 
Mehta [0109] In this scenario, instead of dividing the value evenly, the supervisor may be associated with a higher portion of the value than the employees.
Mehta [0103] Classes may be groups of individuals with similar characteristics, such as lower level employees, supervisors, managers, men, woman, people with red hair, people with blonde hair, etc. ...Examples of different positions include, but are not limited to, employee, supervisor, executive, CEO, CFO, etc.)
It is prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the data extraction and duplicate detection teachings of Bhatnagar to incorporate the database update teachings of Mehta for “using object recognition techniques to update a database” (Mehta [0001] ).        The modification would have been obvious, because it is merely applying a known technique (i.e. database update procedures) to a known concept (i.e. data security) ready for improvement to yield predictable result (i.e. “Such associations of individuals, objects, and/or data values or portions of data values may be recorded in a database of the system for further tracking and analysis, as well as other uses by the enterprise (including marketing, accounts payable, or other such uses). Users need not manually identify the individuals or objects or associations with the data values, reducing end user effort and reducing non-compliance. Rules may be applied to each association of object, individual, and/or data value, as well as across multiple associations, to automatically detect erroneous or fraudulent entries.” Mehta [0003])
Regarding Claim 9, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 4 as described earlier.
Bhatnagar teaches,
  wherein the one or more actions comprise performing a secondary analysis of the receipt.
(Bhatnagar [0036] the duplicate detection module may provide a report containing a potential duplicate pair. An analyst, which may be a human or a machine, may evaluate the duplicate pairs and model reason codes to provide feedback on the duplicate model's performance.
Bhatnagar  [0037] flag the invoice for further review)
Regarding Claim 10, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 9 as described earlier.
Bhatnagar teaches,
    wherein the secondary analysis comprises performing an automated process 
(Bhatnagar [0036] In step 660, the duplicate detection module may provide a report containing a potential duplicate pair. An analyst, which may be ....a machine, may evaluate the duplicate pairs and model reason codes to provide feedback 
Bhatnagar  [0037] flag the invoice for further review)
to further analyze the extracted tokens.
(Bhatnagar [0004]  extracting a plurality of key-value pairs from the invoice;)
Regarding Claim 11, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 9 as described earlier.
Bhatnagar teaches,
      wherein the secondary analysis comprises performing a manual review of the image.
(Bhatnagar [0036] An analyst, which may be a human ..., may evaluate the duplicate pairs and model reason codes to provide feedback on the duplicate model's performance.
Bhatnagar  [0037] flag the invoice for further review
Bhatnagar  [0014] processing which includes reading invoices (both pdfs and images))
Regarding Claim 12, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 9 as described earlier.
Bhatnagar teaches,
        wherein the secondary analysis comprises determining that the duplicate receipt event comprises a false positive identification of a duplicate receipt.
(Bhatnagar [0036]  the duplicate detection module may provide a report containing a potential duplicate pair. An analyst, which may be a human ..., may evaluate the duplicate pairs and model reason codes to provide feedback on the duplicate model's performance....The analyst may provide feedback on both false positives and false negatives. 
Bhatnagar  [Abstract] identify potential duplicate invoices.)
Regarding Claim 13, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 12 as described earlier.
Bhatnagar teaches,
          wherein the secondary analysis comprises determining a condition of the first receipt that caused the false positive identification and configuring a machine learning engine to not identify a future receipt with the condition as a duplicate receipt.
(Bhatnagar  [0037] flag the invoice for further review
Bhatnagar [0036] An analyst, which may be a human or a machine …The analyst may provide feedback on both false positives and false negatives. The feedback may be provided back to the duplicate model to update and re-train the duplicate model with the corrected labels for the pairs.
Bhatnagar [0033] may compare corresponding fields across two invoices and compute different similarity measures.
Bhatnagar  [0034]  the duplicate detection module may assign a label {0, 1} to the feature vector corresponding to each pair. The value may be a 1 if the pair is a duplicate pair, or a 0 if the pair is not a duplicate.
Bhatnagar [0049]  Rather, the operations may be machine operations or any of the operations may be conducted or enhanced by artificial intelligence (AI) or machine learning.)
Regarding Claim 14, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 4 as described earlier.
Bhatnagar teaches,
            wherein the one or more actions comprise rejecting the expense based on the duplicate receipt event.
(Bhatnagar [0020]  Duplicate invoices may be rejected and returned to the supplier.)
Regarding Claim 15, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 1 as described earlier.
Bhatnagar teaches,
              wherein the existing compound key that matches the compound key is associated with a receipt
(Bhatnagar [0004]  extracting a plurality of key-value pairs from the invoice;
Bhatnagar[0014]  organizing the relevant information in a structured template as a key-value pair 
Bhatnagar [0033]  The duplicate detection module may compare corresponding fields across two invoices and compute different similarity measures. 
Bhatnagar [0042] As used herein, “satisfy,” “meet,” “match,” “associated with”, or similar phrases may include an identical match, a partial match, meeting certain criteria, matching a subset of data, a correlation, satisfying certain criteria, a correspondence, an association, an algorithmic relationship, and/or the like. 
Bhatnagar [0014]  invoice processing which includes reading invoices (both pdfs and images))
 previously submitted by the user.
(Bhatnagar [0015] A customer (e.g., merchant) may receive invoices from one...vendors
Bhatnagar  [0014] processing which includes reading invoices (both pdfs and images)
Bhatnagar  [0016] The system may enable detection of duplicate invoices. In this regard, merchants may minimize computing resources used to evaluate whether a paper or digital file has been previously received or processed.)
Regarding Claim 16, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 1 as described earlier.
Bhatnagar teaches,
              wherein the existing compound key that matches the compound key is associated with a receipt 
(Bhatnagar [0004]  extracting a plurality of key-value pairs from the invoice;
Bhatnagar[0014]  organizing the relevant information in a structured template as a key-value pair 
Bhatnagar [0033]  The duplicate detection module may compare corresponding fields across two invoices and compute different similarity measures. 
Bhatnagar [0042] As used herein, “satisfy,” “meet,” “match,” “associated with”, or similar phrases may include an identical match, a partial match, meeting certain criteria, matching a subset of data, a correlation, satisfying certain criteria, a correspondence, an association, an algorithmic relationship, and/or the like. 
Bhatnagar [0014]  invoice processing which includes reading invoices (both pdfs and images))
   submitted by a different user different from the user who provided the image.
(Bhatnagar [0015] A customer (e.g., merchant) may receive invoices from one...vendors
Bhatnagar  [0014] processing which includes reading invoices (both pdfs and images)
Bhatnagar [0040] Moreover, any of the functions or steps may be outsourced to or performed by one or more third parties.)
Regarding Claim 17, 
Bhatnagar, Dirac, Oxford, and Mehta teach the duplicate event detection of Claim 16 as described earlier.
Bhatnagar teaches,
wherein the different user is associated with a different entity than the user who provided the image.
(Bhatnagar [0003] across different vendors
Bhatnagar [0015]  invoices from one or more vendors.
Bhatnagar [0015] A customer (e.g., merchant) may receive invoices from one or more vendors.
Bhatnagar  [0014] processing which includes reading invoices (both pdfs and images))
Claim 18 is rejected on the same basis as Claim 1.
Claim 19 is rejected on the same basis as Claim 3.
Claim 20 is rejected on the same basis as Claim 1.



Response to Remarks
Applicant's arguments filed on April 28, 2022, have been fully considered and Examiner’s remarks to Applicant’s amendments follow.   





Response Remarks on Claim Rejections - 35 USC § 101
The Applicant states:
“the Present Application is directed to the practical application of using and improving machine learning to improve detection and correction of false positive identification of duplicate receipts. For example, "[t]he auditing system can learn, over time, to better handle false positives so as to not flag as duplicates similar receipts that are actually valid expenses." Present Application at [0113]. Further, the auditing system can learn "to identify other receipt information that may distinguish receipts that may be otherwise equal if just compared based on a certain set of fields historically used for a compound key."
Examiner responds:
Examiner maintains the invention does not “improve machine learning to improve detection and correction of false positive identification” but rather employs machine learning for detection and correction of false positive identification.
The focus of the claims is not  an improvement in machine learning as a tool, but on certain independently abstract ideas that use  machine learning  as a tool. "Merely requiring the selection and manipulation of information—to provide a ‘humanly comprehensible’ amount of information useful for users by itself does not transform the otherwise-abstract processes of information collection and analysis."
 The claims’ invocation of computers  and machine learning does not transform the claimed subject matter into patent-eligible applications. The claims at issue do not require any nonconventional computer or  machine learning  components, or even a “non-conventional and non-generic arrangement of known, conventional pieces,” but merely call for performance of the claimed information collection, analysis, and display functions on a set of generic computer components and machine learning  algorithms. For example, the Specification reads, “[0027] various machine learning approaches can be employed to replace and/or augment human auditors”
 	Nothing in the claims, understood in light of the specification, requires anything other than off-the-shelf, conventional computer and machine learning  technology for gathering, synthesizing, sending, and presenting the desired information.  See MPEP 2106.05(d) well-understood, routine, and conventional. 
 	The entirety of the Applicant’s invention is “Mere data gathering” and “Selecting a particular data source or type of data to be manipulated”, see MPEP 2106.05(g)    Insignificant Extra-Solution Activity
The Applicant states:
“Because the machine learning of the claimed solution can be configured to handle different entities in different and customized manners, such as by modifying certain parameters and thresholds in specific amounts corresponding to the entity, and in a non-uniform manner…. With the claimed solution, "[m]achine learning audit results can be triggered and reported at various time points, such as while a user is building an expense report. ... More immediate feedback can notify and make users more aware of auditing procedures that are being employed, which can lessen an occurrence of attempted fraudulent submissions."… The Present Application, and specifically the claimed solution, therefore provides a practical - and technical - application for using machine learning to automatically detect duplicate receipts, before a fraudulent expense report is even submitted, thereby saving resources as well as improving fraud detection.."
Examiner responds:
Examiner maintains the “machine learning” used in  the Applicant’s invention is recited at a high-level of generality (i.e., as a machine learning model performing a machine learning function) such that it amounts no more than mere instructions to apply the exception using a generic computer components and/or electronic processes.    For Example, the Applicant’s Specification reads,  “[0027] various machine learning approaches can be employed to replace and/or augment human auditors.” Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.   The additional elements merely add instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea, see MPEP 2106.05(f). Accordingly, these additional elements, when considered separately and as an ordered combination, do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea and are at a high level of generality. Therefore, the claims are directed to an abstract idea without a practical application.  
Moreover, the prior art teaches:
 “machine learning of the claimed solution can be configured to handle different entities in different and customized manners”:
Dirac [0085]  user-defined functions...may be restricted to security containers defined by the provider network... for example the client's machine learning tasks are executed in an isolated, single-tenant fashion ... The term “MLS control plane” may be used herein to refer to a collection of hardware and/or software entities that are responsible for implementing various types of machine learning functionality on behalf of clients of the MLS
such as by modifying certain parameters and thresholds in specific amounts corresponding to the entity, and in a non-uniform manner”:
Dirac [0068]  enabling clients to explore tradeoffs between various prediction quality metric goals, and to modify settings that can be used for interpreting model execution results
Dirac [0156] set of values for each parameter so as to keep the number of combinations that are to be tried below a threshold.
Dirac [0264] triggering conditions are met (e.g., when the number of features for which parameters are stored exceeds a threshold)
Such evidence indicates the   applicant’s invention  is “merely applying” existing machine learning techniques in a manner that is “well-understood, routine, and conventional” See MPEP 2106.05(d)
Therefore, the rejection under  35 USC § 101 remains.
Response Remarks on Claim Rejections - 35 USC § 103
Applicant's  amendments required the application of new/additional prior art. 
New prior art includes: 
Dirac (“EFFICIENT DUPLICATE DETECTION FOR MACHINE LEARNING DATA SETS”, U.S. Publication Number: 2015/0379430 A1)
Applicant’s remarks regarding the rejection   made under 35 USC § 103 is rendered moot by the introduction of additional prior art.
Therefore, the rejection under  35 USC § 103 remains.
 

Prior Art Cited But Not Applied
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Pugh (“DETECTING DUPLICATE AND NEAR-DUPLICATE FILES”, U.S. Publication Number: 20080162478 A1) proposes an improved duplicate and near-duplicate detection techniques may assign a number of fingerprints to a given document by (i) extracting parts from the document, (ii) assigning the extracted parts to one or more of a predetermined number of lists, and (iii) generating a fingerprint from each of the populated lists. Two documents may be considered to be near-duplicates if any one of their fingerprints match.
Saft (“SYSTEM AND METHODS THEREOF FOR ASSOCIATING ELECTRONIC DOCUMENTS TO EVIDENCE”, U.S. Publication Number: 2019/0057456 A1) proposes determining if a primary evidence contains a required information; extracting at least one distinguishing identifier from the primary evidence upon determination that the primary evidence lacks the required information; searching a data source for at least one secondary evidence that has an association with the primary evidence based on the at least one distinguishing identifier
Kaal (“MATCHING INFORMATION ITEMS”, U.S. Publication Number: 2011/0113029 A1) proposes identifying the presence of matching information items in a network includes using a hashing scheme to generate a set of first hash values from a respective set of first information items stored at a first node and transmitting the set of first hash values over the network to a second node.
Verma (“METHODS AND SYSTEMS FOR AUTOMATICALLY DETECTING FRAUD AND COMPLIANCE ISSUES IN EXPENSE REPORTS AND INVOICES”, U.S. Publication Number: 2016/0358268 A1) proposes detecting anomalies in expense reports of an enterprise includes the step, of implementing a semantic analysis algorithm on an expense report data submitted by an employee, wherein the expense report data is provided in a computer-readable format.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHINEDU EKECHUKWU whose telephone number is (571)272-4493.  The examiner can normally be reached on Mon-Fri 9 AM ET to 3:30 PM ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christine Behncke, can be reached on (571) 272-8103.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/C.E./Examiner, Art Unit 3697
/HAO FU/Primary Examiner, Art Unit 3697