DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on September 23, 2020. 
Claims 1-20 are presented for examination and are pending. 

Information Disclosure Statement
The information disclosure statement(s) (IDS) was/were submitted on September 23, 2020 and October 21, 2021. The submission(s) are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement(s) are being considered by the examiner.

Drawings
The drawings filed on September 23, 2020 are accepted. 

Claim Interpretation
A computer program product comprising: one or more computer readable storage media and program instructions collectively stored on the one or more computer readable storage media, the stored program instructions comprising:
as recited in independent claim 8 and dependent claims 9-14 is interpreted to be non-transitory, as mentioned by Paragraph [0051] of the Specification below: 
“A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber- optic cable), or electrical signals transmitted through a wire.”

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
Regarding Claim 1, 
Step 1: 
Claim 1 is directed to a method, with which is directed to a process, one of the statutory categories. 
Step 2A Prong One: 
Claim 1 recites the following limitations: 
detecting… a correction made to two data records; 
determining… a common attribute between the two data records; 
identifying… a first machine learning model associated with the common attribute; and 
adding… comparison data of the two data records to training data for the machine learning model, wherein the comparison data includes the correction.
This/these limitations require detecting a correction made to two data records (corresponds to observation/evaluation), determining a common attribute between the two data records (corresponds to evaluation), identifying a first machine learning model associated with the common attribute (corresponds to evaluation/judgement), and adding comparison data of the two data records to training data for the machine learning model (corresponds to evaluation with assistance of pen and paper). This falls within the mental process grouping of abstract ideas that can be performed in the human mind, or by a human with pencil and paper. Thus, Claim 1 recites an abstract idea.

Step 2A Prong Two: 
The abstract idea of Claim 1 is not integrated into a practical application because the additional elements recited in Claim 1 are:
by one or more computer processors
Instructions to apply the abstract idea on generic computer components (computer-implemented) do not represent a practical application of the abstract idea (see MPEP 2106.05(f)). Therefore, claim 1 is directed to an abstract idea. 

Step 2B: 
Finally, the additional elements, taken alone or in combination, do not represent significantly more than the abstract idea itself. Using generic computer components (by one or more computer processors) to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer which cannot provide an inventive concept. Therefore, claim 1 is subject matter ineligible. 

Regarding Claim 2, 
Claim 2 is dependent on claim 1 and only includes additional limitations drawn to mental processes: 
determining… two or more common attributes between the two data records; 
identifying… two or more machine learning models, each associated with one of the two or more common attributes; 
applying… the weight associated with each of the two or more common attributes corresponding to the two or more common attributes; and 
adding… a comparison data of the two data records to training data for the two or more machine learning models, each associated with one of the two or more common attributes, wherein the comparison data includes two or more weighted attributes.
This/these limitations require determining two or more common attribute between the two data records (corresponds to evaluation), identifying two or more machine learning models associated with the common attributes (corresponds to evaluation/judgement), applying the weight associated with each common attribute (corresponds to evaluation with assistance of pen and paper), and adding comparison data of the two data records to training data for the machine learning models (corresponds to evaluation with assistance of pen and paper). 
This claim recites the following additional elements: 
retrieving… a weight associated with each of the two or more common attributes;
which amounts to recitation of insignificant extra-solution activity of data gathering. See MPEP 2106.05(g). Further, MPEP 2106(d)(II) notes the following, "The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); ". Accordingly, the additional element(s) does not integrate the abstract idea into a practical application because the recitation of insignificant extra solution activity is well-understood, routine, and conventional. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 3, 
Claim 3 is dependent on claim 2 and only includes additional limitations drawn to mental processes: 
determining… a threshold for the weighted attributes; 
determining… at least one of the weighted probability for each of the two or more common attributes does not meet the threshold; and 
omitting… from training data for the machine learning model associated with the attribute whose weighted probability does not meet the threshold, the weighted probability.
This/these limitations require determining a threshold for weighted attributes (corresponds to evaluation), determining that one of the weighted probability does not meet a threshold (corresponds to evaluation/judgement), and omitting the weighted probability from training data if the weighted probability does not meet the threshold (corresponds to evaluation/judgement with assistance of pen and paper). 
Claim 3 recites additional limitation(s) drawn to mathematical concepts: 
calculating… a weighted probability for each of the two or more common attributes;
This/these limitation(s) require calculating a weighted probability for each common attribute (corresponds to performing mathematical calculations – calculating a weighted probability)
This claim does not recite any additional elements beyond those recited in claim 2, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 4, 
Claim 4 is dependent on claim 2 and only includes additional elements:
receiving… the weight associated with each of the two or more common attributes from a user.
which amounts to recitation of insignificant extra-solution activity of data gathering. See MPEP 2106.05(g). Further, MPEP 2106(d)(II) notes the following, "The courts have recognized the following computer functions as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity...i. Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); TLI Communications LLC v. AV Auto. LLC, 823 F.3d 607, 610, 118 USPQ2d 1744, 1745 (Fed. Cir. 2016) (using a telephone for image transmission); OIP Techs., Inc., v. Amazon.com, Inc., 788 F.3d 1359, 1363, 115 USPQ2d 1090, 1093 (Fed. Cir. 2015) (sending messages over a network); buySAFE, Inc. v. Google, Inc., 765 F.3d 1350, 1355, 112 USPQ2d 1093, 1096 (Fed. Cir. 2014) (computer receives and sends information over a network); ". Accordingly, the additional element(s) does not integrate the abstract idea into a practical application because the recitation of insignificant extra solution activity is well-understood, routine, and conventional. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 5, 
Claim 5 is dependent on claim 1 and only includes additional limitations drawn to mental processes: 
wherein the correction is made by a data steward. 
This/these limitations require that the correction is made by a data steward (corresponds to evaluation with assistance of pen and paper). This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 6, 
Claim 6 is dependent on claim 1 and only includes additional limitations drawn to mental processes: 
wherein the correction is selected from the group consisting of linking the two data records and unlinking the two data records. 
This/these limitations require that the correction consists of linking the two data records or unlinking the two data records (corresponds to evaluation). This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 7, 
Claim 7 is dependent on claim 1 and only includes additional limitations drawn to mental processes: 
determining… a number of common attributes between the two data records is greater than one. 
This/these limitations require determining that the number of common attributes between two data records is greater than one (corresponds to observation and evaluation). This claim does not recite any additional elements beyond those recited in claim 1, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. The claim(s) thus remain subject-matter ineligible.

Regarding Claim 8,
Claim 8 is directed to a computer program product, which is directed to an article of manufacture, one of the statutory categories. Claim 8 recites: 
A computer program product comprising: one or more computer readable storage media and program instructions collectively stored on the one or more computer readable storage media, the stored program instructions comprising: 
which performs a process that has limitations similar to the limitations of claim 1. As performing a mental process or abstract idea on a generic computer component cannot integrate the abstract idea into a practical application and cannot provide an inventive concept, Claim 8 remains subject matter ineligible and is rejected with the same rationale applied against claim 1. 

Regarding Claim 9,
Claim 9 is dependent on claim 8 and recites limitations that are similar to the limitations recited in claim 2. This claim does not recite any additional elements beyond those recited in claim 2/8, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 9 is rejected with the same rationale applied against claim 2.

Regarding Claim 10,
Claim 10 is dependent on claim 9 and recites limitations that are similar to the limitations recited in claim 3. This claim does not recite any additional elements beyond those recited in claim 3/9, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 10 is rejected with the same rationale applied against claim 3.

Regarding Claim 11,
Claim 11 is dependent on claim 9 and recites limitations that are similar to the limitations recited in claim 4. This claim does not recite any additional elements beyond those recited in claim 4/9, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 11 is rejected with the same rationale applied against claim 4.

Regarding Claim 12,
Claim 12 is dependent on claim 8 and recites limitations that are similar to the limitations recited in claim 5. This claim does not recite any additional elements beyond those recited in claim 5/8, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 12 is rejected with the same rationale applied against claim 5.

Regarding Claim 13,
Claim 13 is dependent on claim 8 and recites limitations that are similar to the limitations recited in claim 6. This claim does not recite any additional elements beyond those recited in claim 6/8, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 13 is rejected with the same rationale applied against claim 6.

Regarding Claim 14,
Claim 14 is dependent on claim 8 and recites limitations that are similar to the limitations recited in claim 7. This claim does not recite any additional elements beyond those recited in claim 7/8, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 14 is rejected with the same rationale applied against claim 7.

Regarding Claim 15,
Claim 15 is directed to a computer system comprising: one or more processors…, which is directed to a machine, one of the statutory categories. Claim 15 recites: 
A computer system comprising: one or more computer processors; one or more computer readable storage media; program instructions collectively stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the stored program instructions comprising: 
which performs a process that has limitations similar to the limitations of claim 1. As performing a mental process or abstract idea on a generic computer component cannot integrate the abstract idea into a practical application and cannot provide an inventive concept, Claim 15 remains subject matter ineligible and is rejected with the same rationale applied against claim 1.

Regarding Claim 16,
Claim 16 is dependent on claim 15 and recites limitations that are similar to the limitations recited in claim 2. This claim does not recite any additional elements beyond those recited in claim 2/15, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 16 is rejected with the same rationale applied against claim 2.

Regarding Claim 17,
Claim 17 is dependent on claim 16 and recites limitations that are similar to the limitations recited in claim 3. This claim does not recite any additional elements beyond those recited in claim 3/16, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 17 is rejected with the same rationale applied against claim 3.

Regarding Claim 18,
Claim 18 is dependent on claim 16 and recites limitations that are similar to the limitations recited in claim 4. This claim does not recite any additional elements beyond those recited in claim 4/16, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 18 is rejected with the same rationale applied against claim 4.

Regarding Claim 19,
Claim 19 is dependent on claim 15 and recites limitations that are similar to the limitations recited in claim 5. This claim does not recite any additional elements beyond those recited in claim 5/15, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 19 is rejected with the same rationale applied against claim 5.

Regarding Claim 20,
Claim 20 is dependent on claim 15 and recites limitations that are similar to the limitations recited in claim 6. This claim does not recite any additional elements beyond those recited in claim 6/15, and as such do not recite any additional elements which could integrate the abstract idea into a practical application or be significantly more than the abstract idea. Therefore, Claim 20 is rejected with the same rationale applied against claim 6.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 5-10, 12-17, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Christen (“Automatic Record Linkage using Seeded Nearest Neighbour and Support Vector Machine Classification”) in view of Gschwind et al. (“Fast Record Linkage for Company Entities”). 

Regarding Claim 1, 
Christen teaches: 
A method comprising: detecting, by one or more computer processors, a correction made to two data records; (Section 4, Page 157: “All classifiers are implemented in the Febrl [8] record linkage system, which is written in Python. The libsvm library was used for the SVM classifier [4]. All experiments were run on a 2.13 GHz dual-core CPU with 2 GBytes of main memory, running Linux 2.6.20 and using Python 2.5.1.” teaches a computer-based implementation including one or more computer processors; Fig. 1 and Section 1, Page 152: “When linking two databases, A and B, potentially each record in A should be compared with all records in B. Therefore, the total number of potential record pair comparisons equals |A| × |B|, with | · | denoting the number of records in a database. Similarly, when deduplicating a database, A, the total number of potential record pair comparisons is |A| × (|A| − 1)/2, as each record potentially has to be compared to all others.” teaches linking two databases together and performing record linkage. Record linkage involves deduplication of identical data records in both databases, therefore performing record linkage involves detecting a correction made to two data records from two databases)

    PNG
    media_image1.png
    506
    672
    media_image1.png
    Greyscale


determining, by one or more computer processors, a common attribute between the two data records; (Fig. 2 and Section 1, Page 152: “The two records in a candidate pair are compared using similarity functions applied to selected record attributes (fields). These functions can be as simple as an exact string or a numerical comparison, can take typographical variations into account [11], can be specialised for example for date or time values, or they can be as complex as a distance comparison based on look-up tables of geographic locations (longitudes and latitudes). There are also various approaches to learn such similarity functions from training data [3, 11]. Each similarity function returns a numerical matching weight that is usually normalised, such that 1.0 corresponds to exact similarity and 0.0 to total dissimilarity, with attribute values that are somewhat similar having a matching weight somewhere in between 0 and 1. As illustrated in Figure 2, for each compared record pair a weight vector is formed that contains the matching weights calculated for that pair. Using these weight vectors, candidate pairs are classified into matches, non-matches, and possible matches, depending upon the decision model used [9, 15, 17].” teaches comparing two records in a candidate pair by using similarity functions applied to the attributes, in Fig. 2 the weight vector for R1 and R2 shows that Smith, 42, and Main are common attributes with a weight of 1, Christine/Christina and Street/St are similar attributes with weights of 0.9)

adding, by one or more computer processors, comparison data of the two data records to training data for [a] machine learning model, wherein the comparison data includes the correction. (Section 1, Page 152-152: “It follows that it is possible to automatically select training examples (weight vectors) from the set of all weight vectors that with high likelihood correspond to true matches or true non-matches, and to then train a supervised binary classifier using these training examples as ‘seeds’. For example, of the weight vectors shown in Figure 2, WV(R1,R2) can be selected as a match training example, while WV(R1,R3) and WV(R2,R3), possibly even WV(R1,R4) and WV(R2,R4), can be used as non-match training examples” teaches using data from the weight vectors (comparison data), which are generated by comparing records, as training data for training a machine learning model, the weight vectors are generated to perform record linkage/data deduplication (includes the correction))

Christen does not appear to explicitly teach: 
identifying, by one or more computer processors, a first machine learning model associated with the common attribute; and

However, Gschwind teaches: 
identifying, by one or more computer processors, a first machine learning model associated with the common attribute; and (Section 3C, Page 626: “The scoring tree uses different scoring algorithms, depending on the type of data to be processed. If the data describes an address, we use a geographic scoring, whereas if it describes a company name, we use a scoring algorithm tuned for company names.” teaches applying a different scoring algorithm based on a type of data that should be processed)

Christen and Gschwind are analogous art because they are directed to record linkage.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Gschwind’s multiple individual algorithms with common attributes for record linkages of Christen to choose different machine learning models for analyzing attribute data with a motivation to accommodate virtually any reference dataset and to perform RL on a large variety of entities. (Gschwind Section 3D, Page 627).

Regarding Claim 2, 
The combination of Christen and Gschwind teaches: 
The method of claim 1, 

Christen further teaches: 
further comprising: determining, by one or more computer processors, two or more common attributes between the two data records; (Fig. 2 and Section 1, Page 152: “The two records in a candidate pair are compared using similarity functions applied to selected record attributes (fields). These functions can be as simple as an exact string or a numerical comparison, can take typographical variations into account [11], can be specialised for example for date or time values, or they can be as complex as a distance comparison based on look-up tables of geographic locations (longitudes and latitudes). There are also various approaches to learn such similarity functions from training data [3, 11]. Each similarity function returns a numerical matching weight that is usually normalised, such that 1.0 corresponds to exact similarity and 0.0 to total dissimilarity, with attribute values that are somewhat similar having a matching weight somewhere in between 0 and 1. As illustrated in Figure 2, for each compared record pair a weight vector is formed that contains the matching weights calculated for that pair. Using these weight vectors, candidate pairs are classified into matches, non-matches, and possible matches, depending upon the decision model used [9, 15, 17].” teaches comparing two records in a candidate pair by using similarity functions applied to the attributes, in Fig. 2 the weight vector for R1 and R2 shows that Smith, 42, and Main are common attributes with a weight of 1, Christine/Christina and Street/St are similar attributes with weights of 0.9)


retrieving, by one or more computer processors, a weight associated with each of the two or more common attributes; (Fig. 2 and Section 1, Page 152: “The two records in a candidate pair are compared using similarity functions applied to selected record attributes (fields). These functions can be as simple as an exact string or a numerical comparison, can take typographical variations into account [11], can be specialised for example for date or time values, or they can be as complex as a distance comparison based on look-up tables of geographic locations (longitudes and latitudes). There are also various approaches to learn such similarity functions from training data [3, 11]. Each similarity function returns a numerical matching weight that is usually normalised, such that 1.0 corresponds to exact similarity and 0.0 to total dissimilarity, with attribute values that are somewhat similar having a matching weight somewhere in between 0 and 1. As illustrated in Figure 2, for each compared record pair a weight vector is formed that contains the matching weights calculated for that pair. Using these weight vectors, candidate pairs are classified into matches, non-matches, and possible matches, depending upon the decision model used [9, 15, 17].” teaches that  the weight vector for R1 and R2 shows that Smith, 42, and Main are common attributes with a weight of 1, Christine/Christina and Street/St are similar attributes with weights of 0.9)

applying, by one or more computer processors, the weight associated with each of the two or more common attributes corresponding to the two or more common attributes; and (Section 3.1, Page 154: “Let W be the set of weight vectors that were generated in the comparison step (after blocking has been applied to reduce the total number of detailed record pair comparisons). The aim of the first step of the proposed approach is to select weight vectors from W that with very high likelihood correspond to true matches and true non-matches. The selected weight vectors are inserted into the match seed training examples set, WM , and the non-match seed training examples set, WN , respectively (with WM ∩WN = ∅).” teaches applying the weights associated with the attributes to be used for training machine learning models)

adding, by one or more computer processors, a comparison data of the two data records to training data for the two or more machine learning models, each associated with one of the two or more common attributes, wherein the comparison data includes two or more weighted attributes. (Section 3.1, Page 154: “Let W be the set of weight vectors that were generated in the comparison step (after blocking has been applied to reduce the total number of detailed record pair comparisons). The aim of the first step of the proposed approach is to select weight vectors from W that with very high likelihood correspond to true matches and true non-matches. The selected weight vectors are inserted into the match seed training examples set, WM , and the non-match seed training examples set, WN , respectively (with WM ∩WN = ∅).” teaches selecting weight vectors (comparison data) to be used for training machine learning models; Section 3.2, Page 154: “Once the seed training example sets for matches, WM, and non-matches, WN , have been generated, they can be used to train any binary classifier. In the following two sections, a nearest-neighbour based classifier and an iterative SVM classifier are presented. The set of weight vectors not selected into the seed training example sets will be denoted with WU , with WU = W \ (WM ∪ WN ).” teaches that the weight vectors are used to train multiple machine learning models)

Gschwind further teaches: 
identifying, by one or more computer processors, two or more machine learning models, each associated with one of the two or more common attributes; (Section 3C, Page 626: “The scoring tree uses different scoring algorithms, depending on the type of data to be processed. If the data describes an address, we use a geographic scoring, whereas if it describes a company name, we use a scoring algorithm tuned for company names.” teaches applying a different scoring algorithm based on a type of data that should be processed)

The combination of claim 1 has already incorporated the multiple scoring algorithms for different attributes in record linkage, therefore already incorporating the details of the model identification required by claim 2. 

Regarding Claim 3, 
The combination of Christen and Gschwind teaches: 
The method of claim 2, 

Christen further teaches:
further comprising: determining, by one or more computer processors, a threshold for the weighted attributes; (Section 3.1, Page 154: “The selected weight vectors are inserted into the match seed training examples set, WM , and the non-match seed training examples set, WN , respectively (with WM ∩WN = ∅). There are two main approaches to selecting training examples, either using distance thresholds or nearest-based [6].” teaches using a threshold based approach for selecting weight vectors)

calculating, by one or more computer processors, a weighted probability for each of the two or more common attributes; (Fig. 2 and Section 1, Page 152: “As illustrated in Figure 2, for each compared record pair a weight vector is formed that contains the matching weights calculated for that pair. Using these weight vectors, candidate pairs are classified into matches, non-matches, and possible matches, depending upon the decision model used [9, 15, 17].” teaches determining weight vectors that contain weights (weighted probabilities) for common attributes)

determining, by one or more computer processors, at least one of the weighted probability for each of the two or more common attributes does not meet the threshold; and (Section 3.1, Page 154: “The threshold based approach selects weight vectors that have all their matching weights within a certain distance threshold to the exact similarity or total dissimilarity values, respectively. For example, using the weight vectors from Figure 2 and a distance threshold of 0.2, only WV(R1,R2) will be selected into WM, and WV(R1,R3) and WV(R2,R3) into WN . The remaining three weight vectors will not be selected, as at least one of their matching weights is further than the 0.2 distance threshold away from 0 or 1.” teaches that for a threshold value of 0.2 WV(R1, R2), WV(R1, R3), and WV(R2, R3) are selected and the remaining three weight vectors do not meet the threshold)

omitting, by one or more computer processors, from training data for the machine learning model associated with the attribute whose weighted probability does not meet the threshold, the weighted probability. (Section 3.1, Page 154: “The threshold based approach selects weight vectors that have all their matching weights within a certain distance threshold to the exact similarity or total dissimilarity values, respectively. For example, using the weight vectors from Figure 2 and a distance threshold of 0.2, only WV(R1,R2) will be selected into WM, and WV(R1,R3) and WV(R2,R3) into WN . The remaining three weight vectors will not be selected, as at least one of their matching weights is further than the 0.2 distance threshold away from 0 or 1.” teaches that the weight vectors that do not meet the threshold value are not selected, therefore they are not used to train the machine learning models)

Regarding Claim 5, 
The combination of Christen and Gschwind teaches: 
The method of claim 1, 

Christen further teaches:
wherein the correction is made by a data steward. (Fig. 1 and Section 1, Page 152: “When linking two databases, A and B, potentially each record in A should be compared with all records in B. Therefore, the total number of potential record pair comparisons equals |A| × |B|, with | · | denoting the number of records in a database. Similarly, when deduplicating a database, A, the total number of potential record pair comparisons is |A| × (|A| − 1)/2, as each record potentially has to be compared to all others.” teaches linking two databases together and performing record linkage. Record linkage involves deduplication of identical data records in both databases, therefore performing record linkage involves detecting a correction made to two data records from two databases; Section 1, Page 152-153: “It follows that it is possible to automatically select training examples (weight vectors) from the set of all weight vectors that with high likelihood correspond to true matches or true non-matches, and to then train a supervised binary classifier using these training examples as ‘seeds’.” teaches performing automatic record linkage using a machine learning model (data steward), such as a supervised binary classifier)

Regarding Claim 6, 
The combination of Christen and Gschwind teaches: 
The method of claim 1, 

Christen further teaches:
wherein the correction is selected from the group consisting of linking the two data records and unlinking the two data records. (Fig. 1 and Section 1, Page 152: “When linking two databases, A and B, potentially each record in A should be compared with all records in B. Therefore, the total number of potential record pair comparisons equals |A| × |B|, with | · | denoting the number of records in a database. Similarly, when deduplicating a database, A, the total number of potential record pair comparisons is |A| × (|A| − 1)/2, as each record potentially has to be compared to all others.” teaches performing record linkage to link two data records from two databases together)

Regarding Claim 7, 
The combination of Christen and Gschwind teaches: 
The method of claim 1, 

Christen further teaches:
further comprising, determining, by one or more computer processors, a number of common attributes between the two data records is greater than one. (Fig. 2 and Section 1, Page 152: “The two records in a candidate pair are compared using similarity functions applied to selected record attributes (fields). These functions can be as simple as an exact string or a numerical comparison, can take typographical variations into account [11], can be specialised for example for date or time values, or they can be as complex as a distance comparison based on look-up tables of geographic locations (longitudes and latitudes). There are also various approaches to learn such similarity functions from training data [3, 11]. Each similarity function returns a numerical matching weight that is usually normalised, such that 1.0 corresponds to exact similarity and 0.0 to total dissimilarity, with attribute values that are somewhat similar having a matching weight somewhere in between 0 and 1. As illustrated in Figure 2, for each compared record pair a weight vector is formed that contains the matching weights calculated for that pair. Using these weight vectors, candidate pairs are classified into matches, non-matches, and possible matches, depending upon the decision model used [9, 15, 17].” teaches comparing two records in a candidate pair by using similarity functions applied to the attributes, in Fig. 2 the weight vector for R1 and R2 shows that Smith, 42, and Main are common attributes with a weight of 1, Christine/Christina and Street/St are similar attributes with weights of 0.9)

Regarding Claim 8,
This claim recites A computer program product comprising: one or more computer readable storage media and program instructions collectively stored on the one or more computer readable storage media, the stored program instructions comprising:…, which performs a plurality of operations as recited by the method of claim 1, and has limitations that are similar to the method of claim 1, thus is rejected with the same rationale applied against claim 1.

Regarding Claim 9,
This claim recites The computer program product of claim 8…, which performs a plurality of operations as recited by the method of claim 2, and has limitations that are similar to the method of claim 2, thus is rejected with the same rationale applied against claim 2.

Regarding Claim 10,
This claim recites The computer program product of claim 9…, which performs a plurality of operations as recited by the method of claim 3, and has limitations that are similar to the method of claim 3, thus is rejected with the same rationale applied against claim 3.

Regarding Claim 12,
This claim recites The computer program product of claim 8…, which performs a plurality of operations as recited by the method of claim 5, and has limitations that are similar to the method of claim 5, thus is rejected with the same rationale applied against claim 5.

Regarding Claim 13,
This claim recites The computer program product of claim 8…, which performs a plurality of operations as recited by the method of claim 6, and has limitations that are similar to the method of claim 6, thus is rejected with the same rationale applied against claim 6.

Regarding Claim 14,
This claim recites The computer program product of claim 8…, which performs a plurality of operations as recited by the method of claim 7, and has limitations that are similar to the method of claim 7, thus is rejected with the same rationale applied against claim 7.

Regarding Claim 15,
This claim recites A computer system comprising: one or more computer processors; one or more computer readable storage media; program instructions collectively stored on the one or more computer readable storage media for execution by at least one of the one or more computer processors, the stored program instructions comprising:…, which performs a plurality of operations as recited by the method of claim 1, and has limitations that are similar to the method of claim 1, thus is rejected with the same rationale applied against claim 1.

Regarding Claim 16,
This claim recites The computer system of claim 15…, which performs a plurality of operations as recited by the method of claim 2, and has limitations that are similar to the method of claim 2, thus is rejected with the same rationale applied against claim 2.

Regarding Claim 17,
This claim recites The computer system of claim 16…, which performs a plurality of operations as recited by the method of claim 3, and has limitations that are similar to the method of claim 3, thus is rejected with the same rationale applied against claim 3.

Regarding Claim 19,
This claim recites The computer system of claim 15…, which performs a plurality of operations as recited by the method of claim 5, and has limitations that are similar to the method of claim 5, thus is rejected with the same rationale applied against claim 5.

Regarding Claim 20,
This claim recites The computer system of claim 15…, which performs a plurality of operations as recited by the method of claim 6, and has limitations that are similar to the method of claim 6, thus is rejected with the same rationale applied against claim 6.

Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Christen in view of Gschwind, further in view of Jurek-Loughrey et al. (“Semi-supervised and Unsupervised Approaches to Record Pairs Classification in Multi-source Data Linkage”). 

Regarding Claim 4, 
The combination of Christen and Gschwind teaches: 
The method of claim 2, 
The combination of Christen and Gschwind does not appear to explicitly teach:
further comprising receiving, by one or more computer processors, the weight associated with each of the two or more common attributes from a user.

However, Jurek-Loughrey teaches: 
further comprising receiving, by one or more computer processors, [a classification] from a user. (Section 2.2, Page 11: “With this approach, all the record pairs are clustered by their comparison vectors. Following this, a number of comparison vectors from each cluster is selected and their corresponding pairs of records are then manually classified by a user as match or non-match and added to the final training set.” teaches that a user can classify record linkages as being a match or non-match, user classifications can be used to train the model)

Christen teaches: 
[a classification that corresponds with] the weight associated with each of the two or more common attributes… (Section 1, Page 152: “Each similarity function returns a numerical matching weight that is usually normalised, such that 1.0 corresponds to exact similarity and 0.0 to total dissimilarity, with attribute values that are somewhat similar having a matching weight somewhere in between 0 and 1.” teaches that classifications of exact similarity corresponds to a weight of 1.0 for these attributes and a classification of total dissimilarity corresponds to a weight of 0.0, therefore the user classification corresponds to a weight within the weight vector for a pair of attributes)

Christen, Gschwind, and Jurek-Loughrey are analogous art because they are directed to record linkage.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use Jurek-Loughrey’s method to have a user classify a subset of record pairs with the record pair linkage of Christen/Gschwind with a motivation to improve the performance of the machine learning model with manually labeled data. (Jurek-Loughrey Section 2.2, Page 11).

Regarding Claim 11,
This claim recites The computer program product of claim 9…, which performs a plurality of operations as recited by the method of claim 4, and has limitations that are similar to the method of claim 4, thus is rejected with the same rationale applied against claim 4.

Regarding Claim 18,
This claim recites The computer system of claim 16…, which performs a plurality of operations as recited by the method of claim 4, and has limitations that are similar to the method of claim 4, thus is rejected with the same rationale applied against claim 4.

Conclusion
	The prior art made of record and not relied upon is considered pertinent to the applicant’s disclosure: 
	Borthwick (US 20030126102 A1) teaches using probabilistic models to determine if record linkage should be performed for a pair of data entries. 
	Kapoor et al. (US 7287019 B2) teaches performing similarity scoring algorithms to determine if duplicate entries in a database should be removed. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHOUN ABRAHAM whose telephone number is (571)272-8144. The examiner can normally be reached Mon - Fri 08:00-16:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.J.A./Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125