DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are presented for examination (filed on 16 January 2020).

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1, 11 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1 recites the limitation “receiving, by at least one processor, a first entity record set comprising a plurality of first records… wherein each first entity record of the plurality of first entity records…” in Claim 1, lines 2-6
There is insufficient antecedent basis for this limitation in the claim.
It is unclear to the examiner if “the plurality of first entity records” is referring to “a plurality of first records.” For example, it is unclear to the examiner if “first records” and “first entity record” are referring to different and distinct things or the same thing.
Claim 1 further recites “receiving, by the at least one processor, a second entity record set comprising a plurality of second records associated with the one or more entities; 
wherein each second entity record of the plurality of second entity records comprise respective second entity data representing identifying information associated with a respective second entity;” in Claim 1, lines 7-11.
It is also unclear to the examiner if “the plurality of second entity records” is referring to “a plurality of second records.” For example, it is unclear to the examiner if “second records” and “second entity record” are referring to different and distinct things or the same thing.
Claims 11 and 20 recite claim limitations which are similar to the above mentioned claim limitations in Claim 1, therefore, they are also rejected for the similar reason as stated in claim 1.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-20 rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Claim 1 recites, “A method comprising:
receiving, by at least one processor, a first entity record set comprising a plurality of first records associated with one or more entities; 

receiving, by the at least one processor, a second entity record set comprising a plurality of second records associated with the one or more entities; 
wherein each second entity record of the plurality of second entity records comprise respective second entity data representing identifying information associated with a respective second entity;
generating, by the at least one processor, candidate pairs based at least in part on a similarity between the first entity data of each first entity record and the second entity data of each second entity record; 
wherein each candidate pair comprises a first entity record of the plurality of first entity records and a second entity record of the plurality of second entity records; 
generating, by the at least one processor, one or more features for each respective candidate pair based at least in a part on a respective at least one similarity measure between a respective first entity record and a respective second entity record;
utilizing, by the at least one processor, a scoring machine learning model to determine a match score for each candidate pair based at least in part on each feature of the one or more features for each candidate pair; 
determining, by the at least one processor, at least one cluster of candidate pairs based on the match score of each feature of the one or more features for each candidate pair;
merging, by the at least one processor, records of candidate pairs of each respective cluster of the at least one cluster into a respective entity record of at least one entity record; 
determining, by the at least one processor, a respective entity associated with each respective entity record; and 
updating, by the at least one processor, an entity database with the at least one entity record.”
(Step 1) The claim recites “A computing system comprising: a storage configured to store…a processor configured to…” as drafted, is a system, which is a statutory category of invention.
NOTE: Limitations “wherein each first entity record of the plurality of first entity records comprise respective first entity data representing identifying information associated with a respective first entity;” “wherein each second entity record of the plurality of second entity records comprise respective second entity data representing identifying information associated with a respective second entity;” and “wherein each candidate pair comprises a first entity record of the plurality of first entity records and a second entity record of the plurality of second entity records” which are merely disclosing data and type of data and do not meet any of the categories (MPEP: 2106.03, “Thus, the Federal Circuit has held that a product claim to an intangible collection of information, even if created by human effort, does not fall within any statutory category. Digitech, 758 F.3d at 1350, 111 USPQ2d at 1720 (claimed "device profile" comprising two sets of data did not meet any of the categories because it was neither a process nor a tangible product).”)
(Step 2A-Prong One) The limitations of “generating, by the at least one processor, candidate pairs based at least in part on a similarity between the first entity data of each first entity record and the second entity data of each second entity record;
generating, by the at least one processor, one or more features for each respective candidate pair based at least in a part on a respective at least one similarity measure between a respective first entity record and a respective second entity record;
utilizing, by the at least one processor, a scoring machine learning model to determine a match score for each candidate pair based at least in part on each feature of the one or more features for each candidate pair; 
determining, by the at least one processor, at least one cluster of candidate pairs based on the match score of each feature of the one or more features for each candidate pair;
merging, by the at least one processor, records of candidate pairs of each respective cluster of the at least one cluster into a respective entity record of at least one entity record; 
determining, by the at least one processor, a respective entity associated with each respective entity record;” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor” and “machine,” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “processor” and “machine” language, “generating,” “utilizing…to determine…,” “determining,” “merging” and “determining” in the context of this claim encompasses the user manually 
“generating, …candidate pairs based at least in part on a similarity between the first entity data of each first entity record and the second entity data of each second entity record; wherein each candidate pair comprises a first entity record of the plurality of first entity records and a second entity record of the plurality of second entity records);
generating, … one or more features for each respective candidate pair based at least in a part on a respective at least one similarity measure between a respective first entity record and a respective second entity record;
utilizing, … a scoring machine learning model to determine a match score for each candidate pair based at least in part on each feature of the one or more features for each candidate pair; 
determining, …at least one cluster of candidate pairs based on the match score of each feature of the one or more features for each candidate pair;
merging, … records of candidate pairs of each respective cluster of the at least one cluster into a respective entity record of at least one entity record; 
determining, … a respective entity associated with each respective entity record” in his mind.
If claim limitations, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.
(Step 2A-Prong Two) This judicial exception is not integrated into a practical application. 
In particular, the claim recites additional elements – using “processor” and machine” to perform the “receiving,” “receiving,” “generating,” “utilizing…to determine…,” “determining,” “merging,” “determining” and “updating” steps. The “processor” and “machine” in these steps are recited at a high-level of generality such that they amount no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
Further, the claim recites additional elements – 
receiving, by at least one processor, a first entity record set comprising a plurality of first records associated with one or more entities;
receiving, by the at least one processor, a second entity record set comprising a plurality of second records associated with the one or more entities;
updating, by the at least one processor, an entity database with the at least one entity record.” Which are Mere Data Gathering and is in form of insignificant extra-solution activity (MPEP: 2106.05(g), “iv. Obtaining information about transactions using the Internet to verify credit card transactions, CyberSource v. Retail Decisions, Inc.” and “v. Consulting and updating an activity log”).
(Step 2B) The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “processor” and “machine” to perform “receiving,” “receiving,” “generating,” “utilizing…to determine…,” “determining,” “merging,” “determining” and “updating” steps amount to no more than mere instructions to apply the exception 
The limitation is not sufficient to amount to significantly more than the judicial exception because “receiving,” “receiving” and “updating” only add well-understood, routine and conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception. For example, MPEP 2106.05(d)(II), “i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information)…buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network)," “iii. Electronic recordkeeping, Alice Corp. Pty. Ltd. v. CLS Bank Int'l, 573 U.S. 208, 225, 110 USPQ2d 1984 (2014) (creating and maintaining "shadow accounts"); Ultramercial, 772 F.3d at 716, 112 USPQ2d at 1755 (updating an activity log)” and “iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc”)
Thus, limitation does not amount to significantly more. Even when considered in combination, this additional element represent mere instructions to apply an exception and insignificant extra-solution activity, which does not provide an inventive concept. The claim is not patent eligible.
For claim 2, which recites “The method as recited in claim 1, wherein the identifying information comprises: i) an entity name, ii) a secondary entity name, iii) an entity owner, iv) a phone number, v) a geographic location, vi) industry description, vii) franchise indicator, or viii) any combination thereof.” which is merely data (e.g. contents) Thus, the Federal Circuit has held that a product claim to an intangible collection of information, even if created by human effort, does not fall within any statutory category. Digitech, 758 F.3d at 1350, 111 USPQ2d at 1720 (claimed "device profile" comprising two sets of data did not meet any of the categories because it was neither a process nor a tangible product).”).
For the above reason, the limitation does not change the result of the analysis from the independent claim 1. 
Therefore, Claim 2 is also rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
For claim 3, which recites “The method as recited in claim 1, wherein the one or more features for each respective candidate pair comprises: i) a similarity between a text or sound of an entity name between each of the respective first entity record and the respective second entity record, ii) a similarity between a location of each of the respective first entity record and the respective second entity record, iii) a similarity between a phone number of each of the respective first entity record and the respective second entity record, iv) a similarity between an entity owner of each of the respective first entity record and the respective second entity record, v) a similarity between an industry description of each of the respective first entity record and the respective second entity record, vi) a match between a franchise indicator of each of the respective first entity record and the respective second entity record, and vii) any combination thereof.”
(Step 2A-Prong One) The limitation of “wherein the one or more features for each respective candidate pair comprises: i) a similarity between a text or sound of an processor” and “machine” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “processor” and “machine” language, “…the one or more features for each respective candidate pair comprises: i) a similarity between… ii) a similarity between… iii) a similarity between…iv) a similarity between…v) a similarity between…vi) a match between…” in the context of this claim encompasses the user manually perform “i) a similarity between… ii) a similarity between… iii) a similarity between…iv) a similarity between…v) a similarity between…vi) a match between…” in his mind. 
If claim limitations, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. 
Additionally, it also fells in the grouping of “Mathematical Concepts” (e.g. 
 (Step 2A-Prong Two) This judicial exception is not integrated into a practical application. 
In particular, the claim recites additional elements – using “processor” and “machine” to perform the “…the one or more features for each respective candidate pair comprises: i) a similarity between… ii) a similarity between… iii) a similarity between…iv) a similarity between…v) a similarity between…vi) a match between…” in the context of this claim encompasses the user manually perform “i) a similarity between… ii) a similarity between… iii) a similarity between…iv) a similarity between…v) a similarity between…vi) a match between…” step. The “processor” and “machine” in the step are recited at a high-level of generality such that they amount no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
(Step 2B) The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “processor” and “machine” to perform “…the one or more features for each respective candidate pair comprises: i) a similarity between… ii) a similarity between… iii) a similarity between…iv) a similarity between…v) a similarity between…vi) a match between…” in the context of this claim encompasses the user manually perform “i) a similarity between… ii) a similarity between… iii) a similarity  step amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
For claim 4, which recites “The method as recited in claim 1, further comprising 
filter, by the at least one processor, the candidate pairs by removing weak pairs; and 
wherein each weak pair comprises a respective match score below a threshold score.”
 (Step 2A-Prong One) The limitation of “filter, by the at least one processor, the candidate pairs by removing weak pairs;” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor” and “machine” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “processor” and “machine” language, “…filter, by the at least one processor, the candidate pairs by removing weak pairs…” in the context of this claim encompasses the user manually “filter…the candidate pairs by removing weak pairs; …” in his mind. 
If claim limitations, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. 
The limitation of “wherein each weak pair comprises a respective match score below a threshold score;” as drafted, is a process that, under its broadest reasonable Mathematical Concepts” (e.g. calculating “similarity”). Accordingly, the claim recites an abstract idea. 
 (Step 2A-Prong Two) This judicial exception is not integrated into a practical application. 
In particular, the claim recites additional elements – using “processor” and “machine” to perform the “…filter, by the at least one processor, the candidate pairs by removing weak pairs;” and “wherein each weak pair comprises a respective match score below a threshold score…” steps. The “processor” and “machine” in the step are recited at a high-level of generality such that they amount no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
(Step 2B) The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “processor” and “machine” to perform “…filter, by the at least one processor, the candidate pairs by removing weak pairs;” and “wherein each weak pair comprises a respective match score below a threshold score…” steps amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
For claim 5, which recites “The method as recited in claim 4, wherein the threshold score comprises 0.5” which is merely data (e.g. contents) and does not meet any of the categories (MPEP: 2106.03, “Thus, the Federal Circuit has held that a product claim to an intangible collection of information, even if created by human effort, does not fall within any statutory category. Digitech, 758 F.3d at 1350, 111 USPQ2d at 1720 (claimed "device profile" comprising two sets of data did not meet any of the categories because it was neither a process nor a tangible product).”).
For the above reason, the limitation does not change the result of the analysis from the independent claim 1. 
Therefore, Claim 5 is also rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
For claim 6, which recites “The method as recited in claim 1, wherein generating the candidate pairs comprises: 
generating, by the at least one processor, a plurality of first entity-second entity record pairs by pairing each first entity record of the plurality of first entity records with each second entity record of the plurality of second entity records; 
determining, by the at least one processor, a minhash probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs to identify a first set of candidate pairs; 
determining, by the at least one processor, a rule-based probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record 
combining, by the at least one processor, the first set of candidate pairs and the second set of candidate pairs to generate the candidate pairs.”
(Step 2A-Prong One) The limitations of “generating, by the at least one processor, a plurality of first entity-second entity record pairs by pairing each first entity record of the plurality of first entity records with each second entity record of the plurality of second entity records;” 
“determining, by the at least one processor, a minhash probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs to identify a first set of candidate pairs;” 
“determining, by the at least one processor, a rule-based probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs based at least in part on a set of similarity measurement rules to identify a second set of candidate pairs;” and 
“combining, by the at least one processor, the first set of candidate pairs and the second set of candidate pairs to generate the candidate pairs;” as drafted, are processes that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor” and “machine” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the processor” and “machine” language, “generating…by pairing”, “determining,” “determining” and “combining…to generate the candidate pairs;” in the context of this claim encompasses the user manually “generating, by the at least one processor, a plurality of first entity-second entity record pairs by pairing each first entity record of the plurality of first entity records with each second entity record of the plurality of second entity records;” “determining, by the at least one processor, a minhash probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs to identify a first set of candidate pairs;” “determining, by the at least one processor, a rule-based probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs based at least in part on a set of similarity measurement rules to identify a second set of candidate pairs;” and “combining, by the at least one processor, the first set of candidate pairs and the second set of candidate pairs to generate the candidate pairs…” in his mind. 
If claim limitations, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. 
(Step 2A-Prong Two) This judicial exception is not integrated into a practical application. 
In particular, the claim recites additional elements – using “processor” and “machine” to perform the “…generating, by the at least one processor, a plurality of first entity-second entity record pairs by pairing each first entity record of the plurality of 
“determining, by the at least one processor, a minhash probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs to identify a first set of candidate pairs;” 
“determining, by the at least one processor, a rule-based probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs based at least in part on a set of similarity measurement rules to identify a second set of candidate pairs;” and 
“combining, by the at least one processor, the first set of candidate pairs and the second set of candidate pairs to generate the candidate pairs…” steps. The “processor” and “machine” in the step are recited at a high-level of generality such that they amount no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
(Step 2B) The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “processor” and “machine” to perform “…generating, by the at least one processor, a plurality of first entity-second entity record pairs by pairing each first 
“determining, by the at least one processor, a minhash probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs to identify a first set of candidate pairs;” 
“determining, by the at least one processor, a rule-based probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs based at least in part on a set of similarity measurement rules to identify a second set of candidate pairs;” and 
“combining, by the at least one processor, the first set of candidate pairs and the second set of candidate pairs to generate the candidate pairs…” steps amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
For claim 7, which recites “The method as recited in claim 1, wherein the scoring machine learning model comprises Xgboost.”
(Step 2A-Prong One) The limitation of “scoring machine learning model comprises Xgboost;” as drafted, is a process that, under its broadest reasonable interpretation, it fells in the grouping of “Mathematical Concepts” (e.g. algorithm). Accordingly, the claim recites an abstract idea. 
 (Step 2A-Prong Two) This judicial exception is not integrated into a practical 
In particular, the claim recites additional elements – using “processor” and “machine” to perform the “…scoring machine learning model comprises Xgboost;” step. The “processor” and “machine” in the step are recited at a high-level of generality such that they amount no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
(Step 2B) The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “processor” and “machine” to perform “…scoring machine learning model comprises Xgboost…” step amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
For claim 8, which recites “The method as recited in claim 1, wherein determining the at least one cluster of candidate pairs comprises: 
utilizing, by the at least one process, graphing algorithms to cluster the candidate pairs.
(Step 2A-Prong One) The limitation of “utilizing, by the at least one process, graphing algorithms to cluster the candidate pairs;” as drafted, is a process that, under its broadest reasonable interpretation, it fells in the grouping of “Mathematical Concepts” (e.g. algorithm). Accordingly, the claim recites an abstract idea. 
 (Step 2A-Prong Two) This judicial exception is not integrated into a practical application. 
In particular, the claim recites additional elements – using “processor” and “machine” to perform the “utilizing, by the at least one process, graphing algorithms to cluster the candidate pairs” step. The “processor” and “machine” in the step are recited at a high-level of generality such that they amount no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
(Step 2B) The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “processor” and “machine” to perform “…utilizing, by the at least one process, graphing algorithms to cluster the candidate pairs…” step amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
For claim 9, which recites “The method as recited in claim 1, further comprising, 
determining, by the at least one processor, identifying information associated with a respective first entity or identifying information associated with a respective second entity or both comprising an entity type.”
(Step 2A-Prong One) The limitation of “determining, by the at least one processor, identifying information associated with a respective first entity or identifying information associated with a respective second entity or both comprising an entity type;” as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting “processor” and “machine” nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the “processor” and “machine” language, “…determining… identifying information associated with a respective first entity or identifying information associated with a respective second entity or both comprising an entity type…” in the context of this claim encompasses the user manually “determining…identifying information associated with a respective first entity or identifying information associated with a respective second entity or both comprising an entity type” in his mind. 
If claim limitations, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. 
(Step 2A-Prong Two) This judicial exception is not integrated into a practical application. 
In particular, the claim recites additional elements – using “processor” and “machine” to perform the “…determining, by the at least one processor, identifying processor” and “machine” in the step are recited at a high-level of generality such that they amount no more than mere instructions to apply the exception using generic computer components. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
(Step 2B) The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using “processor” and “machine” to perform “…determining, by the at least one processor, identifying information associated with a respective first entity or identifying information associated with a respective second entity or both comprising an entity type…” step amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. The claim is not patent eligible.
For claim 10, which recites “The method as recited in claim 9, wherein the entity type comprises a profession type” which is merely data (e.g. type of data) and does not meet any of the categories (MPEP: 2106.03, “Thus, the Federal Circuit has held that a product claim to an intangible collection of information, even if created by human effort, does not fall within any statutory category. Digitech, 758 F.3d at 1350, 111 USPQ2d at 1720 (claimed "device profile" comprising two sets of data did not meet any of the categories because it was neither a process nor a tangible product).”).
For the above reason, the limitation does not change the result of the analysis from the independent claim 1. 
Therefore, Claim 10 is also rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
For claim 11, it is a method claim having similar limitations as cited in claims 1 and 6-7. Thus, claim 11 is also rejected under the same analysis as explained in the rejection of rejected claim 1 and 6-7.
For claim 12, it is a method claim having similar limitations as cited in claim 2. Thus, claim 12 is also rejected under the same analysis as explained in the rejection of rejected claim 2.
For claim 13, it is a method claim having similar limitations as cited in claim 3. Thus, claim 13 is also rejected under the same analysis as explained in the rejection of rejected claim 3.
For claim 14, it is a method claim having similar limitations as cited in claim 4. Thus, claim 14 is also rejected under the same analysis as explained in the rejection of rejected claim 4.
For claim 15, it is a method claim having similar limitations as cited in claims 1, 5, 6 and 7. Thus, claim 15 is also rejected under the same analysis as explained in the rejection of rejected claims 1, 5, 6 and 7.
For claim 16, it is a method claim having similar limitations as cited in claim 6. Thus, claim 16 is also rejected under the same analysis as explained in the rejection of 
For claim 17, it is a method claim having similar limitations as cited in claim 8. Thus, claim 17 is also rejected under the same analysis as explained in the rejection of rejected claim 8.
For claim 18, it is a method claim having similar limitations as cited in claim 9. Thus, claim 18 is also rejected under the same analysis as explained in the rejection of rejected claim 9.
For claim 19, it is a method claim having similar limitations as cited in claim 10. Thus, claim 19 is also rejected under the same analysis as explained in the rejection of rejected claim 10.
For claim 20, it is a system claim having similar limitations as cited in claim 1. Thus, claim 20 is also rejected under the same analysis as explained in the rejection of rejected claim 1.
Additionally, claim 20 recites new claim limitation “an entity database configured to store entity records”
(Step 2A-Prong Two) This judicial exception is not integrated into a practical application. 
The claim recites additional elements – “store entity records” which are Mere Data Gathering and is in form of insignificant extra-solution activity (MPEP: 2106.05(g), “v. Consulting and updating an activity log”).
Further, the claim recites additional elements – “…store entity records,” where merely describes how to generally “apply” the concept of managing data in database in a computer environment (MPEP: 2106.05(f)(2), “(2) Whether the claim invokes computers or other machinery merely as a tool to perform an existing process. Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more.”) which is Mere Instructions To Apply An Exception. The claimed computer components are recited at a high level of generality and are merely invoked as tools to perform an existing database. Simply implementing the abstract idea on a generic computer is not a practical application of the abstract idea.
(Step 2B) The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.
The limitation is not sufficient to amount to significantly more than the judicial exception because “store” only add well-understood, routine and conventional activities previously known to the industry, specified at a high level of generality, to the judicial exception. For example, MPEP 2106.05(d)(II), “iii. Electronic recordkeeping, Alice Corp. Pty. Ltd. v. CLS Bank Int'l, 573 U.S. 208, 225, 110 USPQ2d 1984 (2014) (creating and maintaining "shadow accounts"); Ultramercial, 772 F.3d at 716, 112 USPQ2d at 1755 (updating an activity log); iv. Storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015); OIP Techs.,”)
Thus, limitation does not amount to significantly more. Even when considered in combination, this additional element represent mere instructions to apply an exception 
Further, the claim recites additional elements – “…store entity records,” where merely describes how to generally “apply” the concept of managing data in database in a computer environment (MPEP: 2106.05(f)(2), “(2) Whether the claim invokes computers or other machinery merely as a tool to perform an existing process. Use of a computer or other machinery in its ordinary capacity for economic or other tasks (e.g., to receive, store, or transmit data) or simply adding a general purpose computer or computer components after the fact to an abstract idea (e.g., a fundamental economic practice or mathematical equation) does not integrate a judicial exception into a practical application or provide significantly more.”) which is Mere Instructions To Apply An Exception. The claimed computer components are recited at a high level of generality and are merely invoked as tools to perform an existing database. Simply implementing the abstract idea on a generic computer does not provide significantly more.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all 
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 3, 4, 6, 8, 9, 10 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Shimanovsky et al. (U.S. Pub. No.: US 20140279757, hereinafter Shimanovsky), in view of Guo et al. (U.S. Pub. No.: US 20170091692, hereinafter Guo). 
For claim 1, Shimanovsky discloses a method comprising:
receiving, by at least one processor, a first entity record set comprising a plurality of first records associated with one or more entities (Shimanovsky: Paragraph [0014], “…This disclosure generally relates to apparatus, systems, and methods for grouping data records associated with entities…” which indicates “a plurality of first records” (e.g. “data records”) are associated with “one or more entities” (“entities”), where “a first entity record set” is broadly interpreted as a part of “data records”
Paragraph [0018], “The one or more modules are configured to identify at least one pair of data records for which to determine a similarity value, determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records”
encourage data sources…” paragraph [0048], “For example, several web pages may reference a restaurant called "The French Laundry". Some may misspell the name of the restaurant, for instance, "The French Luandry," while others may refer to the restaurant as "French Laundry Restaurant".” which indicates records are from different “data sources,” (e.g. first set of “ records” is obtained from first “data sources,” and second set of “records” is obtained from second “data sources”),
Paragraph [0049], “…grouping data records based on entities referenced by the data records. The disclosed grouping mechanism can include determining a pair-wise similarity between a large number of data records”
Paragraph [0079], “In step 202, the CR module 112 is configured to receive one or more pairs of data records and determine whether the one or more pairs of data records are eligible for clustering.”
WHERE “a first entity record set comprising a plurality of first records” is broadly interpreted as “data records” or “a large number of data records” from different sources);
wherein each first entity record of the plurality of first entity records comprise respective first entity data representing identifying information associated with a respective first entity (Shimanovsky: Paragraph [0014], “…This disclosure generally relates to apparatus, systems, and grouping data records associated with entities…” Paragraph [0045], “…a data record can include a set of attributes, each attribute designed to convey information about a particular aspect of an entity…”
WHERE “respective first entity data” is broadly interpreted as “attributes” (e.g. one of the “attributes”));
receiving, by the at least one processor, a second entity record set comprising a plurality of second records associated with the one or more entities (Shimanovsky: Paragraph [0014], “…This disclosure generally relates to apparatus, systems, and methods for grouping data records associated with entities…” which indicates “a plurality of first records” (e.g. “data records”) are associated with “one or more entities” (“entities”), where “a second entity record set” is broadly interpreted as another part of “data records,”
Paragraph [0018], “The one or more modules are configured to identify at least one pair of data records for which to determine a similarity value, determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records”
Paragraph [0047], “…encourage data sources…” paragraph [0048], “For example, several web pages may reference a restaurant called "The French Laundry". Some may misspell the name of the restaurant, for instance, "The French Luandry," while others may ” which indicates records are from different “data sources,” (e.g. first set of “ records” is obtained from first “data sources,” and second set of “records” is obtained from second “data sources”
Paragraph [0049], “…grouping data records based on entities referenced by the data records. The disclosed grouping mechanism can include determining a pair-wise similarity between a large number of data records”
Paragraph [0079], “In step 202, the CR module 112 is configured to receive one or more pairs of data records and determine whether the one or more pairs of data records are eligible for clustering.”
WHERE “a second entity record set comprising a plurality of second records” is broadly interpreted as “data records” or “a large number of data records” from different sources); 
wherein each second entity record of the plurality of second entity records comprise respective second entity data representing identifying information associated with a respective second entity (Shimanovsky: Paragraph [0014], “…This disclosure generally relates to apparatus, systems, and methods for grouping data records associated with entities…” Paragraph [0045], “…a data record can include a set of attributes, each attribute designed to convey information about a particular aspect of an entity…”
WHERE “respective second entity data” is broadly interpreted as “attributes” (e.g. one of the “attributes”)); 
generating, by the at least one processor, candidate pairs based at least in part on a similarity between the first entity data of each first entity record and the second entity data of each second entity record (Shimanovsky: paragraph [0018], “…to identify at least one pair of data records for which to determine a similarity value, determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records…” Paragraph [0049], “…grouping data records based on entities referenced by the data records. The disclosed grouping mechanism can include determining a pair-wise similarity between a large number of data records”
Paragraph [0080], “…determine the eligibility of a pair of data records by bucketing data records. In particular, the CR module 112 can be configured to bucket data records based on a predetermined set of attributes, referred to as a bucket type, that are likely to be shared by related data records and that are unlikely to be shared by unrelated data records…For example, for restaurant entities, the CR module 112 can use a concatenation of the name attribute and the postcode attribute as one bucket type, and a concatenation of the phone number and the approximate location (e.g., a latitude/longitude coordinate) as a second bucket type. Though such bucketing, the CR module quickly identify a pair of data records that may refer to the same entity, which are eligible for further processing…”
WHERE “candidate pairs” is broadly interpreted as “identify at least one pair of data records,” 
WHERE “a similarity between the first entity data of each first entity record and the second entity data of each second entity record” is broadly interpreted as “to bucket data records based on a predetermined set of attributes, referred to as a bucket type, that are likely to be shared by related data records” (e.g. “the name attribute and the postcode attribute as one bucket type” or “concatenation of the phone number and the approximate location (e.g., a latitude/longitude coordinate) as a second bucket type”). It would have been obvious to an person with ordinary skill in the art that “data records” in “one pair of data records” is from first set or second set or first and second set of data records); 
wherein each candidate pair comprises a first entity record of the plurality of first entity records and a second entity record of the plurality of second entity records (Shimanovsky: paragraph [0018], “…to identify at least one pair of data records for which to determine a similarity value, determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records…” Paragraph [0049], “…grouping data records based on entities referenced by the data records. The determining a pair-wise similarity between a large number of data records”
Paragraph [0080], “…determine the eligibility of a pair of data records by bucketing data records. In particular, the CR module 112 can be configured to bucket data records based on a predetermined set of attributes, referred to as a bucket type, that are likely to be shared by related data records and that are unlikely to be shared by unrelated data records…For example, for restaurant entities, the CR module 112 can use a concatenation of the name attribute and the postcode attribute as one bucket type, and a concatenation of the phone number and the approximate location (e.g., a latitude/longitude coordinate) as a second bucket type. Though such bucketing, the CR module 112 can quickly identify a pair of data records that may refer to the same entity, which are eligible for further processing…”
WHERE “wherein each candidate pair comprises a first entity record…and a second entity record…” is broadly interpreted as “a pair of data records”); 
generating, by the at least one processor, one or more features for each respective candidate pair based at least in a part on a respective at least one similarity measure between a respective first entity record and a respective second entity record In step 204, the SC module 114 can receive eligible pairs of data records and determine similarities between the eligible pairs of data records. The similarities can be determined by analyzing a similarity of attributes associated with the data records. For example, when two data records are both associated with an entity type "restaurant" and the name associated with the two data records are "Massa" and "Masa", then the similarity between the two data records can be high. In contrast, when one of the pair of data records is associated with an entity type "restaurant" and the other data record is associated with an entity type "airplane," then the similarity between the two data records can be low.”
Paragraph [0084], “The SC module 114 can use a similarity function to determine a similarity value between a pair of data records. A similarity function can include a plurality of sub-functions that produce scores and other features for each attribute and optionally a composite similarity score. The sub-function may include one or more operations per attribute with custom logic for specific entity types. In some embodiments, the sub-function may include simple string similarity computation functions, such as a Levenshtein distance computation function. In some embodiments, the sub-function may include an acronym aware function that is designed to compare the first letter of values associated with attributes…” 
WHERE “generating, by the at least one processor, one or more features for each respective candidate pair” is broadly interpreted as “A similarity function can include a plurality of sub-functions that produce scores and other features for each attribute”); 
utilizing, by the at least one processor, a scoring machine learning model to determine a match score for each candidate pair based at least in part on each feature of the one or more features for each candidate pair (Shimanovsky: paragraph [0025], “…the similarity function is designed to infer an importance of a particular component associated with a particular attribute of a data record, wherein the similarity function is learned by determining differences between components associated with the particular attribute of the training data records, wherein the training data records are known belong to the same cluster, and determining the importance of the particular component based on a number of times the particular component appears in the differences” Paragraph [0086], “…the SC module 114 can learn the similarity function using a machine learning technique. In particular, the SC module 114 can learn the similarity function using a supervised learning technique…” WHERE “a scoring machine learning model” is broadly interpreted as “similarity function is learned” or “machine learning technique.”
Paragraph [0084], “The SC module 114 can use a similarity function to determine a similarity value between a pair of data records. A similarity function can include a plurality of sub-functions that produce scores and other features for each attribute and optionally a composite similarity score. The sub-function may include one or more operations per attribute with custom logic for specific entity types.” WHERE “determine a match score for each candidate pair” is broadly interpreted as “a similarity function to determine a similarity value between a pair of data records. A similarity function can include a plurality of sub-functions that produce scores and other features for each attribute…”
Paragraph [0102], “…A feature can indicate a component (e.g., a value) of an attribute that is shared by the pair of data records…Therefore, when two, previously unseen data records share one of the determined features, the SC module 114 can determine that the two data records can have high similarity”
Paragraph [0110], “In step 306, the SC module 114 can be configured to create a similarity function that can determine similarities between data records based on the features determined in step 304.” Paragraph [0114], “…the SC module 114 can determine a similarity function based on the features and the importance associated with the features.”); 
determining, by the at least one processor, at least one cluster of candidate pairs based on the match score of each feature of the one or more features for each candidate pair (Shimanovsky: paragraph [0133], “In step 206, one the SC computes the pair-wise similarities between eligible pairs of data records, the CC module 116 can receive…similarities between the eligible pairs of data records. Subsequently, the CC module 116 can cluster the received data records based on the similarities.”).
However, Shimanovsky does not explicitly disclose merging, by the at least one processor, records of candidate pairs of each respective cluster of the at least one cluster into a respective entity record of at least one entity record; 
determining, by the at least one processor, a respective entity associated with each respective entity record; and 
updating, by the at least one processor, an entity database with the at least one entity record.
Guo discloses merging, by the at least one processor, records of candidate pairs of each respective cluster of the at least one cluster into a respective entity record of at least one entity record (Guo: Paragraph [0073], “A similar process can be used in situations where the organization name is not missing in an organization record, but rather there is some doubt as to whether the organization name in a particular organization record corresponds to the same organization as is listed in another organization record (e.g., whether they should be combined into a single record). Thus, the name enrichment component 324 may also provide output directly to the clustering/fusing component 326…” Paragraph [0266], “After record linkage, each cluster may have multiple members (i.e., different records from various sources representing the same business entity). Because the sources provide information about an organization with various levels of completeness, truthfulness, and accessibility of presentation, it is common to find conflicts among members within a cluster. Data fusion aims at resolving conflicts and finding the true value of each attribute. In this step, the clusters and their members are taken as the input, and the most representative attribute values are selected to form a golden record...”
WHERE “merging, by the at least one processor, records of candidate pairs of each respective cluster of the at least one cluster into a respective entity record of at least one entity record” is broadly interpreted as “combined into a single record” or “clusters and their members are taken as the input, and the most representative attribute values are selected to form a golden record”); 
determining, by the at least one processor, a respective entity associated with each respective entity record (Guo: Paragraph [0073], “A similar process can be used in situations where the organization name is not missing in an organization record, but rather there is some doubt as to whether the organization name in a particular organization record corresponds to the same organization as is listed in another organization record (e.g., whether they should be combined into a single record). Thus, the name enrichment component 324 may also provide output directly to the ” paragraph [0163], “Data fusion selects trustworthy attributes from various records that refer to the same organization to form a "golden" record of the organization.” Paragraph [0266], “After record linkage, each cluster may have multiple members (i.e., different records from various sources representing the same business entity). Because the sources provide information about an organization with various levels of completeness, truthfulness, and accessibility of presentation, it is common to find conflicts among members within a cluster. Data fusion aims at resolving conflicts and finding the true value of each attribute. In this step, the clusters and their members are taken as the input, and the most representative attribute values are selected to form a golden record...”
WHERE “a respective entity” is broadly interpreted as “an organization” or “form a "golden" record of the organization”); and 
updating, by the at least one processor, an entity database with the at least one entity record (Guo: Paragraph [0073], “A similar process can be used in situations where the organization name is not missing in an organization record, but rather there is some doubt as to whether the organization name in a particular organization record corresponds to the same organization as is listed in another organization record (e.g., whether they should be combined into a single record). Thus, the name enrichment component 324 may also provide output directly to the clustering/fusing component 326…” Paragraph [0266], “After record linkage, each cluster may have multiple members (i.e., different records from various sources representing the same business entity)…Data fusion aims at resolving conflicts and finding the true value of each attribute. In this step, the clusters and their members are taken as the input, and the most representative attribute values are selected to form a golden record...” 
Paragraph [0054], “The output of the system 300 is enriched organization data stored in a master database 302.” Paragraph [0058], “The complete records 320 may be passed to a clustering/fusing component 326. The clustering/fusing component 326 acts to cluster together records that appear to be similar to each other (i.e., are likely to be duplicates, albeit with different values for some of the fields) and then fuse together records it determines are indeed duplicates of one another. This will be described in more detail below.”
WHERE “updating” is broadly interpreted as “stored,”
WHERE “an entity database” is broadly interpreted as “a master database 302”
WHERE “updating, by the at least one processor, an entity database with the at least one entity record” is broadly interpreted as “output of the system 300 is enriched organization data stored in a master database 302” (e.g. “a golden record” as the output of “clustering/Fusing”, see Fig. 3, item 326 “CLUSTERING/FUSING COMPONENT” and Item 302, “MASTER DATABASE”))).
 Additionally, Guo also discloses receiving, by at least one processor, a first entity record set comprising a plurality of first records associated with one or more entities (Guo: Paragraph [0055], “An ingestion platform 304 obtains data from one or more data sources 306A-306C…” Paragraph [0070], “…there are six key fields for organization records: name, address, phone number, organization website (also known as Uniform Resource Locator (URL)), description, and logo…” Paragraph [0266], “After record linkage, each cluster may have multiple members (i.e., different records from various sources representing the same business entity).” These paragraphs indicate “data”/“different records” is obtained from different “data sources”/“various sources” (e.g. first set of “data”/“ records” is obtained from first “data sources,” and second set of “data”/“records” is obtained from second “data sources.”);
wherein each first entity record of the plurality of first entity records comprise respective first entity data representing identifying information associated with a respective first entity (Guo: Paragraph [0055], “An ingestion platform 304 obtains data from one or more data sources 306A-306C…” Paragraph [0070], “…there are six key fields for organization records: name, address, phone number, organization website (also known as Uniform Resource Locator (URL)), description, and logo…” Paragraph [0266], “After record linkage, each cluster may have multiple members (i.e., different records from various sources representing the same business entity).”
WHERE “each first entity record” is broadly interpreted as “organization records,”
WHERE “respective first entity data” is broadly interpreted as “six key fields for organization records”
WHERE “a respective first entity” is broadly interpreted as “organization”);
receiving, by the at least one processor, a second entity record set comprising a plurality of second records associated with the one or more entities (Guo: Paragraph [0055], “An ingestion platform 304 obtains data from one or more data sources 306A-306C…” Paragraph [0070], “…there are six key fields for organization records: name, address, phone number, organization website (also known as Uniform Resource Locator (URL)), description, and logo…” Paragraph [0266], “After record linkage, each cluster may have multiple members (i.e., different records from various sources representing the same business entity).” These paragraphs indicate “data”/“different records” is obtained from different “data sources”/“various sources” (e.g. first set of “data”/“ records” is obtained from first “data sources,” and second set of “data”/“records” is obtained from second “data sources.”);
 wherein each second entity record of the plurality of second entity records comprise respective second entity data representing identifying information associated with a respective second entity (Guo: Paragraph [0055], “An ingestion platform 304 obtains data from one or more data sources 306A-306C…” Paragraph [0070], “…there are six key fields for organization records: name, address, phone number, organization website (also known as Uniform Resource Locator (URL)), description, and logo…” Paragraph [0266], “After record linkage, each cluster may have multiple members (i.e., different records from various sources representing the same business entity).”
WHERE “each second entity record” is broadly interpreted as “organization records,”
WHERE “respective second entity data” is broadly interpreted as “six key fields for organization records”
WHERE “a respective second entity” is broadly interpreted as “organization”); 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to improve upon “APPARATUS, SYSTEMS, AND METHODS FOR GROUPING DATA RECORDS” as taught by Shimanovsky by implementing “INFERRING ATTRIBUTES OF ORGANIZATIONS USING MEMBER ” as taught by Guo, because it would provide Shimanovsky’s method with the enhanced capability of “solving technical challenges in data collection and management. More specifically, the present disclosure relates to the inferring of attributes of organizations using a member graph” (Guo: Paragraph [0002]) and “solving two problems: record linkage and data fusion. Record linkage aims at linking records that refer to the same organization, even if the records do not appear identical. …” (Guo: Paragraph [0163]) in order to “selects trustworthy attributes from various records that refer to the same organization to form a "golden" record of the organization” (Guo: Paragraph [0163]).
For claim 2, Shimanovsky and Guo disclose the method as recited in claim 1, wherein the identifying information comprises: i) an entity name, ii) a secondary entity name, iii) an entity owner, iv) a phone number, v) a geographic location, vi) industry description, vii) franchise indicator, or viii) any combination thereof (Shimanovsky: paragraph [0045], “…a data record can include a set of attributes, each attribute designed to convey information about a particular aspect of an entity. For example, a data record can include an attribute "type of entity," and the value of the attribute can be "restaurant." The data record can also include an attribute "name," and the value of the attribute can be "Le Bernardin." The data record can also include an attribute "a number of years in business," and the value of the attribute can be "42." In ”
Paragraph [0080], “…for restaurant entities, the CR module 112 can use a concatenation of the name attribute and the postcode attribute as one bucket type, and a concatenation of the phone number and the approximate location (e.g., a latitude/longitude coordinate) as a second bucket type.” 
Paragraph [0118], “…the address attribute 506 and the phone number attribute 508…”
WHERE “an entity name” is broadly interpreted as “an attribute "name,"” (e.g. “value of the attribute can be "Le Bernardin."”),
WHERE “a geographic location” is broadly interpreted as “postcode attribute,” “approximate location (e.g., a latitude/longitude coordinate),” or “address attribute”
WHERE “a phone number” is broadly interpreted as “phone number attribute 508”).
For claim 3, Shimanovsky and Guo disclose the method as recited in claim 1, wherein the one or more features for each respective candidate pair comprises: i) a similarity between a text or sound of an entity name between each of the respective first entity record and the respective second entity record, ii) a similarity between a location of each of the respective first entity record and the respective second entity record, iii) a For example, when the similarity function determines a similarity between two data records associated with the entity type "restaurants," the similarity function can take into account, at least in part, the similarity of attributes associated with the data records, such as " name", "location", "average price", "popularity," and/or "years in operation."…”
paragraph [0084], “The SC module 114 can use a similarity function to determine a similarity value between a pair of data records. A similarity function can include a plurality of sub-functions that produce scores and other features for each attribute and optionally a composite similarity score. The sub-function may include one or more operations per attribute with custom logic for specific entity types. In some embodiments, the sub-function may include simple string similarity computation functions, such as a Levenshtein distance computation function. In some embodiments, the sub-function may include an acronym aware function that is designed to compare the first letter of distance function for computing a difference in the latitude and longitude data. These sub-functions can be adapted based on the attributes to which the sub-functions are applied.”
Paragraph [0118], “The two data records 502, 504 are determined…based on the high similarity of components in the address attribute 506 and the phone number attribute 508”
WHERE “a similarity between a text…of an entity name between each of the respective first entity record and the respective second entity record” is broadly interpreted as “…the similarity of attributes associated with the data records, such as "name"…” and “simple string similarity computation functions” (where “text” is interpreted as “string”)
WHERE “a similarity between a location of each of the respective first entity record and the respective second entity record” is broadly interpreted as “the similarity of attributes associated with the data records, such as…"location"”).
For claim 4, Shimanovsky and Guo disclose the method as recited in claim 1, further comprising 
filter, by the at least one processor, the candidate pairs by removing weak pairs; and 
wherein each weak pair comprises a respective match score below a threshold score
To this end, the CC module 116 can be configured to adjust one or more parameters for the clustering technique used to clustering data records (e.g., a minimum similarity value for two data records to be in the same cluster)…” paragraph [0152], “In step 806, once the SC module 114 computes the similarity scores between the target record and the one or more data records…the CC module 116 can indicate that the target record does not belong to any one of the candidate clusters because even the highest similarity score is not sufficiently high (e.g., the similarity score is lower than a predetermined threshold).” WHERE “a respective match score below a threshold score” is broadly interpreted as “minimum similarity value for two data records to be in the same cluster” or “not sufficiently high (e.g., the similarity score is lower than a predetermined threshold”).
For claim 6, Shimanovsky and Guo disclose the method as recited in claim 1, wherein generating the candidate pairs comprises: 
generating, by the at least one processor, a plurality of first entity-second entity record pairs by pairing each first entity record of the plurality of first entity records with each second entity record of the plurality of second entity records (Shimanovsky: paragraph [0018], “…to identify at least one pair of data records for which to determine a similarity value, determine the similarity value for the at least one pair of data records based, at least on a plurality of attributes associated with the at least one pair of data records…” Paragraph [0049], “…grouping data records based on entities referenced by the data records. The disclosed grouping mechanism can include determining a pair-wise similarity between a large number of data records”
Paragraph [0080], “…determine the eligibility of a pair of data records by bucketing data records. In particular, the CR module 112 can be configured to bucket data records based on a predetermined set of attributes, referred to as a bucket type, that are likely to be shared by related data records and that are unlikely to be shared by unrelated data records…For example, for restaurant entities, the CR module 112 can use a concatenation of the name attribute and the postcode attribute as one bucket type, and a concatenation of the phone number and the approximate location (e.g., a latitude/longitude coordinate) as a second bucket type. Though such bucketing, the CR module 112 can quickly identify a pair of data records that may refer to the same entity, which are eligible for further processing…”);
determining, by the at least one processor, a minhash probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs to identify a first set of candidate pairs (Shimanovsky: Paragraph [0080], “…determine the eligibility of a pair of data records by bucketing data records. In particular, the CR module 112 can be configured to bucket data records based on a predetermined set of attributes, referred to as a bucket type, that are likely to be shared by related data records and that are unlikely to be shared by unrelated data records…For example, for restaurant entities, the CR module 112 can use a concatenation of the name attribute and the postcode attribute as one bucket type, and a concatenation of the phone number and the approximate location (e.g., a latitude/longitude coordinate) as a second bucket type. Though such bucketing, the CR module 112 can quickly identify a pair of data records that may refer to the same entity, which are eligible for further processing…” Paragraph [0081], “In some embodiments, the CR module 112 can be configured to quickly compare bucket types in data records using a hashing technique. For example, the CR module 112 is configured to convert values of bucket types into hashes (e.g., a set of bits), and is configured to determine that data records whose hashes have more bits in common are more likely to be eligible for further processing. The hashing technique can include any hashing technique that can estimate a similarity between two attributes, such as simhash or minhash.”); 
determining, by the at least one processor, a rule-based probability of similarity between a respective first entity record and a respective second entity record of each first entity-second entity record pair of the plurality of first entity-second entity record pairs based at least in part on a set of similarity measurement rules to identify a second In any one of the embodiments disclosed herein, the similarity function is designed to determine a conditional likelihood that a missing attribute of a data record has a particular component, wherein the conditional likelihood is determined by determining a combination of known attributes corresponding to a particular entity, determining all variations of a missing attribute amongst data records of the particular entity having the combination of known attributes, and determining a conditional probability, based on the variations of the missing attribute, that the missing attribute has a particular component given that the data record has the particular combination of known attributes.”
WHERE “a rule-based probability of similarity” is broadly interpreted as “conditional probability” (where “rule-based” is broadly interpreted as “conditional”)
Paragraph [0080], “…determine the eligibility of a pair of data records by bucketing data records. In particular, the CR module 112 can be configured to bucket data records based on a predetermined set of attributes, referred to as a bucket type, that are likely to be shared by related data records and that are unlikely to be shared by unrelated data records…For example, for restaurant entities, the CR module 112 can use a concatenation of the name attribute and the postcode attribute as one bucket type, and a concatenation of the phone number and the approximate location (e.g., a latitude/longitude coordinate) as a second bucket type. Though such bucketing, the CR module 112 can quickly identify a pair of data records that may refer to the same entity, which are eligible for further processing…” Paragraph [0081], “In some embodiments, the CR module 112 can be configured to quickly compare bucket types in data records using a hashing technique. For example, the CR module 112 is configured to convert values of bucket types into hashes (e.g., a set of bits), and is configured to determine that data records whose hashes have more bits in common are more likely to be eligible for further processing. The hashing technique can include any hashing technique that can estimate a similarity between two attributes, such as simhash or minhash.”); and 
combining, by the at least one processor, the first set of candidate pairs and the second set of candidate pairs to generate the candidate pairs (Shimanovsky: Paragraph [0080], “…determine the eligibility of a pair of data records by bucketing data records. In particular, the CR module 112 can be configured to bucket data records based on a predetermined set of attributes, referred to as a bucket type, that are likely to be shared by related data records and that are unlikely to be shared by unrelated data records…For example, for restaurant entities, the CR module 112 can use a concatenation of the name attribute and the postcode attribute as one bucket type, and a concatenation of the phone number and the approximate location (e.g., a latitude/longitude coordinate) as a second bucket type. Though such bucketing, the CR module 112 can quickly identify a pair of data records that may refer to the same entity, which are eligible for further processing…” Paragraph [0081], “In some embodiments, the CR module 112 can be configured to quickly compare bucket types in data records using a hashing technique. For example, the CR module 112 is configured to convert values of bucket types into hashes (e.g., a set of bits), and is configured to determine that data records whose hashes have more bits in common are more likely to be eligible for further processing. The hashing technique can include any hashing technique that can estimate a similarity between two attributes, such as simhash or minhash.” Paragraph [0084], “The SC module 114 can use a similarity function to determine a similarity value between a pair of data records. A similarity function can include a plurality of sub-functions that produce scores and other features for each attribute and optionally a composite similarity score. The sub-function may include one or more operations per attribute with custom logic for specific entity types. In some embodiments, the sub-function may include simple string similarity computation functions, such as a Levenshtein distance computation function. In some embodiments, the sub-function may include an acronym aware function that is designed ”).
For claim 8, Shimanovsky and Guo disclose the method as recited in claim 1, wherein determining the at least one cluster of candidate pairs comprises: 
utilizing, by the at least one process, graphing algorithms to cluster the candidate pairs (Shimanovsky: paragraph [0067], “…The cluster computation (CC) module 116 is configured to receive similarity values for pairs of data records, and determine, based on the similarity values, whether to place one or more pairs of data records in the same cluster. In some embodiments, the CC module 116 can use a graph clustering technique to cluster data records based on pairwise similarity values…” WHERE “graphing algorithms” is broadly interpreted as “graph clustering technique”).
For claim 9, Shimanovsky and Guo disclose the method as recited in claim 1, further comprising, 
determining, by the at least one processor, identifying information associated with a respective first entity or identifying information associated with a respective second entity or both comprising an entity type (Shimanovsky: paragraph [0067], “…In some embodiments, the CR module 112 can determine the eligibility of a pair of data records by bucketing data records. In particular, the CR module 112 can be configured to bucket data records based on a predetermined set of attributes, referred to as a bucket type, that are likely to be shared by when two data records have an entity type of "restaurant," then they are likely to be clustered together if the name attribute of the two data records are similar…”
WHERE “both comprising an entity type” is broadly interpreted as “entity type” (e.g. “when two data records have an entity type of "restaurant,"”).
For claim 10, Shimanovsky and Guo disclose the method as recited in claim 9, wherein the entity type comprises a profession type (Shimanovsky: paragraph [0055], “…In some embodiments, the disclosed grouping mechanism can infer how attributes relate to a particular entity type or can determine relative similarity of entities based on domain specific rules associated with a particular entity type…when comparing doctors, it is desirable that the data grouping mechanism can determine that, instead of flavor, specialty, medical school, and the number of years in practice may have the most impact on determining similarities among doctors…” Paragraph [0065], “…when comparing data records corresponding to the entity type "doctor"…”
WHERE “the entity type comprises a profession type” is broadly interpreted as “the entity type "doctor"”).
For claim 20, it is a system claim having similar limitations as cited in claim 1. 
Further, Guo discloses additional claim limitation “an entity database configured to store entity records” (Shimanovsky: Paragraph [0150], “In step 802, the coordination module 118 can receive a data record to be added to one of the existing clusters. This data record can be referred to as a target record…the coordination module 118 can use the one or more of attributes of the target record to search…a database to determine one or more candidate clusters for the target record. For example, when a target record is associated with a street address attribute of "123 .mu.m Street," the coordination module 118 can determine that the target record can be associated with the same cluster as the Saint Amour Restaurant of FIG. 5…” which indicates “data record” is stored in “database”, WHERE “an entity database” is broadly interpreted as “database” which maintain the records (see Fig. 5)). 

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Shimanovsky et al. (U.S. Pub. No.: US 20140279757, hereinafter Shimanovsky), in view of Guo et al. (U.S. Pub. No.: US 20170091692, hereinafter Guo), and further in view of Wang et al. (“CrowdER: Crowdsourcing Entity Resolution,” 2012, hereinafter Wang). 
For claim 5, Shimanovsky and Guo disclose the method as recited in claim 4. 
However, Shimanovsky and Guo do not explicitly disclose wherein the threshold score comprises 0.5.
Wang discloses wherein the threshold score comprises 0.5 (Wang: Page 2, right column, “Similarity-based techniques require a similarity function and a threshold. The similarity function takes a pair of records as input, and outputs a similarity value. The more similar the two records, the higher the output value. The basic approach is to compute the similarity of all pairs of records. If a pair of records has a similarity value no smaller than the specified threshold, then they are considered to refer to the same entity…For example, in Table 1, suppose that the similarity of two records is specified as Jaccard similarity between their Product Names, and the specified threshold is 0.5…The similarity-based technique will consider (r1, r2) as referring to the same entity since their Jaccard similarity is no smaller than the threshold, i.e., J(r1, r2) ≥ 0.5…”.
WHERE “threshold score comprises 0.5” is broadly interpreted “the specified threshold is 0.5” (e.g. “referring to the same entity since their Jaccard similarity is no smaller than the threshold, i.e., J(r1, r2) ≥ 0.5”)).
 It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to improve upon “APPARATUS, SYSTEMS, AND METHODS FOR GROUPING DATA RECORDS” as taught by Shimanovsky by CrowdER: Crowdsourcing Entity Resolution” as taught by Wang, because it would provide Shimanovsky’s method with the enhanced capability of “…compute the similarity of all pairs of records. If a pair of records has a similarity value no smaller than the specified threshold, then they are considered to refer to the same entity…” (Wang: Page 2) in order to “find all such duplicate records”) (Wang: Page 1).

Claims 7, 11, 12, 13, 14, 16, 17, 18 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Shimanovsky et al. (U.S. Pub. No.: US 20140279757, hereinafter Shimanovsky), in view of Guo et al. (U.S. Pub. No.: US 20170091692, hereinafter Guo), and further in view of Zeng et al. (U.S. Pub. No.: US 20200210771,” hereinafter Zeng). 
For claim 7, Shimanovsky and Guo disclose the method as recited in claim 1.
However, Shimanovsky and Guo do not explicitly disclose wherein the scoring machine learning model comprises Xgboost.
Zeng discloses wherein the scoring machine learning model comprises Xgboost (Zeng: paragraph [0026], “In a related embodiment, a machine-learned classification model is trained based on labeled data that comprises multiple training instances, each comprising (1) a set of feature values for multiple features pertaining to a pair of records and (2) a label indicating whether the pair of records is a match…For example, a random pair of records in the each field in one record is checked against a corresponding field in the other record to ensure that there is no matching value. Such a pair of records is used as a negative training instance. Example machine learning techniques that may be used to train the model include…XGBoost…”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to improve upon “APPARATUS, SYSTEMS, AND METHODS FOR GROUPING DATA RECORDS” as taught by Shimanovsky by implementing “COMPUTER-BASED SYSTEMS CONFIGURED FOR ENTITY RESOLUTION FOR EFFICIENT DATASET REDUCTION” as taught by Zeng, because it would provide Shimanovsky’s method with the enhanced capability of “…a machine-learned prediction model is trained based on entity data and sub-entity data for the features…The prediction model is evaluated based on accuracy and effectiveness and may be automatically retrained.” (Zeng: paragraph [0010]) and “…a pair of records is used as a negative training instance…” (Zeng: paragraph [0026) in order to “a machine learning framework to identify high quality entities and track downstream entity conversion are provided…” (Zeng: paragraph [0010]).
For claim 11, it is a method claim having similar limitations as cited in claims 1 and 6-7. Thus, claim 11 is also rejected under the same rationale as cited in the rejection of rejected claim 1 and 6-7.
For claim 12, it is a method claim having similar limitations as cited in claim 2. Thus, claim 12 is also rejected under the same rationale as cited in the rejection of rejected claim 2.
For claim 13, it is a method claim having similar limitations as cited in claim 3. Thus, claim 13 is also rejected under the same rationale as cited in the rejection of rejected claim 3.
For claim 14, it is a method claim having similar limitations as cited in claim 4. Thus, claim 14 is also rejected under the same rationale as cited in the rejection of rejected claim 4.
For claim 16, it is a method claim having similar limitations as cited in claim 6. Thus, claim 16 is also rejected under the same rationale as cited in the rejection of rejected claim 6.
For claim 17, it is a method claim having similar limitations as cited in claim 8. Thus, claim 17 is also rejected under the same rationale as cited in the rejection of rejected claim 8.
For claim 18, it is a method claim having similar limitations as cited in claim 9. Thus, claim 18 is also rejected under the same rationale as cited in the rejection of rejected claim 9.
For claim 19, it is a method claim having similar limitations as cited in claim 10. Thus, claim 19 is also rejected under the same rationale as cited in the rejection of rejected claim 10.

Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Shimanovsky et al. (U.S. Pub. No.: US 20140279757, hereinafter Shimanovsky), in view of Guo et al. (U.S. Pub. No.: US 20170091692, hereinafter Guo), and further in view of Zeng et al. (U.S. Pub. No.: US 20200210771,” hereinafter Zeng), and further in view of Wang et al. (“CrowdER: Crowdsourcing Entity Resolution,” 2012, hereinafter Wang). 
For claim 15, it is a method claim having similar limitations as cited in claims 1, 5, 6 and 7. Thus, claim 15 is also rejected under the same rationale as cited in the rejections of rejected claims 1, 5, 6 and 7.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YU ZHAO whose telephone number is (571)270-3427.  The examiner can normally be reached on Monday-Friday 9AM-5PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 5712724046.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  


YU . ZHAO
Examiner
Art Unit 2169



/YU ZHAO/Patent Examiner of Art Unit 2169