DETAILED ACTION
This Office action is in response to original application filed on 07/02/2020.
Claims 1-20 are pending. Claims 1-20 are rejected.

Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Information Disclosure Statement
The information disclosure statement(s) (IDS) submitted on 07/02/2020 and 10/01/2021 were filed prior to this Office action.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement(s) is/are being considered by the examiner.

Examiner Notes/Objections
FIG. 4, element 412 refers to “candiate" and should likely refer to “candidate.”
¶ 0030 of the instant specification recites “due to poorly managed account” and should likely recite either “a poorly managed account” or “poorly managed accounts.”
Appropriate correction may be required.
Claims 4 and 13 refer to “a distance score greater that the specified value for the distance threshold” and should likely recite “a distance score greater than the specified value for the distance threshold.” Appropriate correction is required.


Statutory Review under 35 USC § 101
Claims 1-9 are directed towards a method and have been reviewed.
Claims 1-9 appear to be non-statutory as the method is directed to an abstract idea without significantly more based on currently known judicial exceptions.
Claims 10-18 are directed toward a system and have been reviewed.
Claims 10-18 initially appear to be statutory under 35 USC § 101, as it invokes 35 U.S.C. 112(f). A claim that properly recites a means-type limitation cannot be software per se because it necessarily includes the processor along with the special programming that accomplishes the function.
However, claims 10-18 perform the method of claims 1-9, which is directed to an abstract idea without significantly more based on currently known judicial exceptions
Claims 19-20 are directed toward an article of manufacture and have been reviewed.
Claims 19-20 initially appear to be statutory, as the article of manufacture excludes transitory signals.
However, claims 19-20 perform the method of claims 1 and 5, which is directed to an abstract idea without significantly more based on currently known judicial exceptions.

Claim Interpretation - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

Claims 10-18 have been interpreted under 35 U.S.C. 112(f), because they use a generic placeholder “memory” and “processors” coupled with functional language “stores” and “perform” without reciting sufficient structure to achieve the function.  Furthermore, the generic placeholder is not preceded by a structural modifier.
Since the claim limitation(s) invokes 35 U.S.C. 112(f), claims 10-18 have been interpreted to cover the corresponding structure described in the specification that achieves the claimed function, and equivalents thereof.  
A review of the specification shows that the following appears to be the corresponding structure described in the specification for the 35 U.S.C. 112(f) limitation:
¶ 0085 recites, “As used herein, the term "memory" refers to a machine-readable medium 1238 able to store data temporarily or permanently” and further recites, “The term "machine-readable medium" shall accordingly be taken to include, but not be limited to, one or more data repositories in the form of a solid-state memory (e.g., flash memory), an optical medium, a magnetic medium, other non-volatile memory (e.g., erasable programmable read-only memory (EPROM)), or any suitable combination thereof. The term "machine-readable medium" specifically excludes non-statutory signals per se.”If applicant wishes to provide further explanation or dispute the examiner’s interpretation of the corresponding structure, applicant must identify the corresponding structure with reference to the specification by page and line number, and to the drawing, if any, by reference characters in response to this Office action. 
If applicant does not intend to have the claim limitation(s) treated under 35 U.S.C. 112(f), applicant may amend the claim(s) so that it/they will clearly not invoke 35 U.S.C. 112(f), or present a sufficient showing that the claim recites/recite sufficient structure, material, or acts for performing the claimed function to preclude application of 35 U.S.C. 112(f).
For more information, see MPEP § 2173 et seq. and Supplementary Examination Guidelines for Determining Compliance With 35 U.S.C. 112 and for Treatment of Related Issues in Patent Applications, 76 FR 7162, 7167 (Feb. 9, 2011).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
The claim(s) recite(s) receiving a desired data field, accessing master data, comparing data to generate scores, selecting data with certain scores, and analyzing data related to the selected data to generate an additional score (based on the previous scores) and providing a corresponding recommendation. This is interpreted as a mental process, drawn to concepts performed in the human mind, including an observation, evaluation, judgment, opinion.
Claims 6-7 and 15-16 do introduce ranking, which is also a mental process.
This judicial exception is not integrated into a practical application because the claim is further directed to an abstract idea with additional generic computer elements. The generically recited computer elements “a computing system, a memory, one or more processors, etc.” do not add a meaningful limitation to the abstract idea because they amount to simply implementing the abstract idea on a computer.
Further, the claims do recite “using a machine learning model”; however, it is a nominal or token extra-solution component of the claim, and is being considered an attempt to generally link the claims to a particular technological environment.

The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because, when considered separately and in combination, they do not add significantly more to the exception.
Claims 2 and 11; claims 4 and 13; claims 5, 14, and 20; and claims 9 and 17 do not include any additional elements that would not fall in line with known mental processes.
The claims retrieve information in memory (such as independent claims 1, 10, and 19) and also store and retrieve information in memory (in claims 3 and 12), which are well-understood, routine, conventional computer functions as recognized by the court decisions listed in MPEP § 2106.05(d).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8; 10-15, 17-18; and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Salhin, U.S. Patent Application Publication No. 2020/0243134 (filed January 24, 2020, prior to the instant application date of July 2, 2020; hereinafter Salhin) in view of Bayliss, U.S. Patent No. 7,720,846 (published May 18, 2010; hereinafter Bayliss).

Regarding claim 1, Salhin teaches:
A computer-implemented method comprising: receiving, at a computing system, a requested data field of a new data record; (Salhin FIG. 9, ¶ 0154-0157 describes the claimed 'requested data field' through its amount field predictions: assign an amount metric 256 to the group based on the determined degree of similarity; see the amount metric in play in relation to expected amounts 452-456 of FIG. 14 as described in ¶ 0170: For each output data record an expected day/week of future transaction of that type is predicted, which along with the expected amount field, may be used to help cash flow predictions; Salhin teaches this involving a new data record through ¶ 0175: If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
accessing, by the computing system, master data comprising a plurality of representative data records, each representative data record representing a cluster of similar data records, (Salhin ¶ 0148-0149: if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”, a clustered textual descriptor may be “business rent” as this applies to all the data records in the cluster; FIG. 10, ¶ 0158 describe outputting a group of temporally-repeating textually matched data records 222 alongside a group descriptor 260: determine a group descriptor 260, and apply the group descriptor 260 to the data record group 222. The group descriptor 260 is a general descriptor for the group of data records applicable to all records in the group 222; see then Bayliss teaching that these records are used as input [and thus address the claimed 'accessing'] in FIG. 16, ¶ 0178: A recorded group contains a plurality of data records 272 which exhibit textually similar descriptions, have a temporally repeating nature (e.g. a frequency of data record repeat); assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group)
and each similar data record having a confidence score indicating a confidence level… (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of steps such as those of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group)
comparing the new data record to each representative data record of the plurality of representative data records using a machine learning model to generate a distance score, for each representative data record, corresponding to a distance between the new data record and each representative data record; (Salhin FIGs. 16-17, ¶ 0178:  A method to determine whether to include a new data record 274 into an existing group as illustrated comprises: assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group; the textual group descriptor based on the textual similarity of data records 272 in the group... determining a textual fitting parameter based on the textual group descriptor... determining if a textual portion of a new data record 274 corresponds to the textual fitting parameter; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0176 shows this involving a machine learning model: new data records may be classified to either join existing data records which have been grouped, or to form a new group. The data group output from the unsupervised machine learning model(s) may be used to train a further machine learning model (e.g. a deep learning model) to comprehend the relationship between new data record fields (amount, descriptions, dates, . . . etc.) and the data recorded in the earlier formed group 222)
selecting a set of … data records having a generated distance score within a distance threshold; (Salhin FIG. 15, ¶ 0173: if the data records in the group of data records have a level of textual similarity below the textual similarity threshold, providing the data records for further similarity analysis; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair ... comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records)

analyzing the cluster of similar data records corresponding to each representative data record in … set of representative data records to generate candidate values for the requested data field of the new data record; (Salhin FIG. 8, ¶ 0152 describes the claimed 'cluster of similar data records': the data record amount identifier module 246 may be arranged to identify an amount value of each data record in the group of textually matched data records 246 from the amount portions of the data records. The module 246 therefore outputs a group of data records having identified amounts 248; FIG. 9, ¶ 0154-0156 describes the claimed 'candidate values' for a request involving the 'amount value' field: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity; ¶ 0175 shows that this can involve the claimed 'new data record': If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
generating a candidate score for each of the candidate values using the distance score for the representative data record to which the candidate value corresponds, (Salhin FIG. 9, ¶ 0154-0156 describe the claimed generating of a score for each of the candidate values: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity [see FIG. 9 referring to Amounts as "same, similar, stepwise, or different," which address the claimed 'scores' for the values], and if all the data records in the group of temporally-repeating textually matched data records 248 have a level of temporal, textual and amount value similarity above an overall similarity threshold; see that this involves the claimed 'distance score' through ¶ 0142-0144 teaching textual similarity being calculated based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0143: an edit distance may be used (an edit distance is a way of quantifying how dissimilar two strings (textual descriptors) [shows relevance to 'representative data record'] are from one another)
and the confidence score for the similar data record to which the candidate value corresponds; and (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group [this means the amount value similarity determination occurs after the confidence level check, fulfilling the claimed 'generating a candidate score' occurring as a result of an initial 'confidence score' determination])
providing a recommendation for the requested data field based on the candidate scores for the candidate values. (Salhin describes "requested data field" through ¶ 0044-0048: The system may further comprise a data record amount identifier module arranged to identify an amount value of each data record in the group of textually matched data records from the amount portions of the data records; determine a degree of similarity of the identified amount values of the textually matched data records in the group; and assign an amount metric to the group based on the determined degree of similarity; FIG. 9, ¶ 0155-0156 are relevant to the claimed 'recommendation'; ¶ 0155 describes recommendations: Another example is of a particular stock item (e.g. powder chemical) which is shipped from a warehouse of a manufacturing plant. A future likely shipment may be predicted, and the amount required predicted, to aid future chemical manufacturing to have the required amount in stock; ¶ 0156 shows this being based on the claimed 'candidate scores': an amount metric 256 may be determined to classify the amount value of each data record in a group to be exact, similar (e.g. the same to within ±5%), stepwise (e.g. increasing by a set amount with time, within a predetermined tolerance), or different (e.g. no clear pattern); see also relevant FIG. 14, ¶ 0170 describing expected amounts 452-456 that are the same, similar, or dis-similar, respectively)
Salhin does not expressly disclose a confidence level that the similar data record corresponds to the cluster.
Salhin further does not expressly disclose the bolded limitations seen below:
selecting a set of representative data records having a generated distance score within a distance threshold;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records…
However, Bayliss addresses this by teaching the following:	each representative data record representing a cluster of similar data records, and each similar data record having a confidence score indicating a confidence level that the similar data record corresponds to the cluster; (Bayliss FIG. 5, col. 19, lines 11-58 describe linked entity references comprising data fields: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other; col. 19, lines 24-34 describes a confidence value assigned to the linkage: a probablistic process whereby a confidence value is assigned to a proposed link between two entity references based on the degree that the field values match optionally adjusted by a weighting factor [the linked references based on matching are interpreted as fulfilling the claimed 'cluster of similar data records']; see the entities being linked to each other in light of FIG. 8, col. 27, lines 31-53 and Table 9)
Bayliss further teaches:
selecting a set of representative data records having a generated distance score within a distance threshold; (Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file. A data field from the subset of data fields included in the entity references is selected (step 706) and the information or value in the selected data field of entity reference A is compared (step 708) to the information or value in the selected data field of entity reference B; col. 27, line 42-col. 28, line 42 and Table 9 describe a graph 800 graphically demonstrating two linked entity references; see then Bayliss col. 37, lines 10-25 showing that this can involve a distance: The "proximity" of entity references to each other may be represented graphically as the distance (i.e., variance) between the entity references at each extreme of each axis representing a selected data field)
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records… (Bayliss FIG. 5, col. 19, lines 10-58: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other [relevant to the claimed 'cluster of similar data records']; see this in light of FIG. 8, col. 27, line 40-col. 28, line 12: Entity references 802-808 represent the entity references of Table 8 having RIDs 1-4, respectively. The arrows between the entity references 802-808 represent the matches determined by performing steps 704-710 in comparing first/middle/last name, SSN, and/or DOB data fields; see steps 704-710 performed in Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file [shows representative data records]; After each of the subset of data fields of the entity reference pair is compared for a match and a record added to the match table 722 when a match occurs, a new entity reference pair may be selected from the master file and steps 706-710 may be repeated for the new entity reference pair) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Bayliss to assign confidence values to similar entity references to improve similar techniques within Salhin to determine similar data records.
In addition, both of the references (Salhin and Bayliss) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Bayliss regarding identifying related entity references and assigning a weight-adjustable confidence value to the techniques of Salhin regarding grouping of textually-matched data records. Motivation to do so would also be that solving many of the limitations affecting known attempts at mining, interpreting and understanding data (as in Bayliss col. 2, line 62-col. 3, line 13) would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.


Regarding claim 10, Salhin teaches:
A system comprising: a memory that stores instructions, and one or more processors configured by the instructions to perform operations comprising: (Salhin ¶ 0097: a computer, comprising one or more processing devices arranged to, in use, execute computer readable instructions to implement a data record grouping application, wherein the data record grouping application is arranged to perform any method described herein; ¶ 0193: embodiments of the present invention can be realised in the form of hardware, software or a combination of hardware and software. Any such software may be stored in the form of volatile or non-volatile storage such as, for example, a storage device like a ROM, whether erasable or rewritable or not, or in the form of memory; the storage devices and storage media are embodiments of machine-readable storage that are suitable for storing a program or programs that, when executed, implement embodiments of the present invention)
receiving a requested data field of a new data record; (Salhin FIG. 9, ¶ 0154-0157 describes the claimed 'requested data field' through its amount field predictions: assign an amount metric 256 to the group based on the determined degree of similarity; see the amount metric in play in relation to expected amounts 452-456 of FIG. 14 as described in ¶ 0170: For each output data record an expected day/week of future transaction of that type is predicted, which along with the expected amount field, may be used to help cash flow predictions; Salhin teaches this involving a new data record through ¶ 0175: If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
accessing master data comprising a plurality of representative data records, each representative data record representing a cluster of similar data records, (Salhin ¶ 0148-0149: if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”, a clustered textual descriptor may be “business rent” as this applies to all the data records in the cluster; FIG. 10, ¶ 0158 describe outputting a group of temporally-repeating textually matched data records 222 alongside a group descriptor 260: determine a group descriptor 260, and apply the group descriptor 260 to the data record group 222. The group descriptor 260 is a general descriptor for the group of data records applicable to all records in the group 222; see then Bayliss teaching that these records are used as input [and thus address the claimed 'accessing'] in FIG. 16, ¶ 0178: A recorded group contains a plurality of data records 272 which exhibit textually similar descriptions, have a temporally repeating nature (e.g. a frequency of data record repeat); assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group)
and each similar data record having a confidence score indicating a confidence level… (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of steps such as those of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group)
comparing the new data record to each representative data record of the plurality of representative data records using a machine learning model to generate a distance score, for each representative data record, corresponding to a distance between the new data record and each representative data record; (Salhin FIGs. 16-17, ¶ 0178:  A method to determine whether to include a new data record 274 into an existing group as illustrated comprises: assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group; the textual group descriptor based on the textual similarity of data records 272 in the group... determining a textual fitting parameter based on the textual group descriptor... determining if a textual portion of a new data record 274 corresponds to the textual fitting parameter; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0176 shows this involving a machine learning model: new data records may be classified to either join existing data records which have been grouped, or to form a new group. The data group output from the unsupervised machine learning model(s) may be used to train a further machine learning model (e.g. a deep learning model) to comprehend the relationship between new data record fields (amount, descriptions, dates, . . . etc.) and the data recorded in the earlier formed group 222)
selecting a set of … data records having a generated distance score within a distance threshold; (Salhin FIG. 15, ¶ 0173: if the data records in the group of data records have a level of textual similarity below the textual similarity threshold, providing the data records for further similarity analysis; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair ... comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records)
analyzing the cluster of similar data records corresponding to each representative data record in … set of representative data records to generate candidate values for the requested data field of the new data record; (Salhin FIG. 8, ¶ 0152 describes the claimed 'cluster of similar data records': the data record amount identifier module 246 may be arranged to identify an amount value of each data record in the group of textually matched data records 246 from the amount portions of the data records. The module 246 therefore outputs a group of data records having identified amounts 248; FIG. 9, ¶ 0154-0156 describes the claimed 'candidate values' for a request involving the 'amount value' field: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity; ¶ 0175 shows that this can involve the claimed 'new data record': If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
generating a candidate score for each of the candidate values using the distance score for the representative data record to which the candidate value corresponds, (Salhin FIG. 9, ¶ 0154-0156 describe the claimed generating of a score for each of the candidate values: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity [see FIG. 9 referring to Amounts as "same, similar, stepwise, or different," which address the claimed 'scores' for the values], and if all the data records in the group of temporally-repeating textually matched data records 248 have a level of temporal, textual and amount value similarity above an overall similarity threshold; see that this involves the claimed 'distance score' through ¶ 0142-0144 teaching textual similarity being calculated based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0143: an edit distance may be used (an edit distance is a way of quantifying how dissimilar two strings (textual descriptors) [shows relevance to 'representative data record'] are from one another)
and the confidence score for the similar data record to which the candidate value corresponds; and (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group [this means the amount value similarity determination occurs after the confidence level check, fulfilling the claimed 'generating a candidate score' occurring as a result of an initial 'confidence score' determination])
providing a recommendation for the requested data field based on the candidate scores for the candidate values. (Salhin describes "requested data field" through ¶ 0044-0048: The system may further comprise a data record amount identifier module arranged to identify an amount value of each data record in the group of textually matched data records from the amount portions of the data records; determine a degree of similarity of the identified amount values of the textually matched data records in the group; and assign an amount metric to the group based on the determined degree of similarity; FIG. 9, ¶ 0155-0156 are relevant to the claimed 'recommendation'; ¶ 0155 describes recommendations: Another example is of a particular stock item (e.g. powder chemical) which is shipped from a warehouse of a manufacturing plant. A future likely shipment may be predicted, and the amount required predicted, to aid future chemical manufacturing to have the required amount in stock; ¶ 0156 shows this being based on the claimed 'candidate scores': an amount metric 256 may be determined to classify the amount value of each data record in a group to be exact, similar (e.g. the same to within ±5%), stepwise (e.g. increasing by a set amount with time, within a predetermined tolerance), or different (e.g. no clear pattern); see also relevant FIG. 14, ¶ 0170 describing expected amounts 452-456 that are the same, similar, or dis-similar, respectively)
Salhin does not expressly disclose a confidence level that the similar data record corresponds to the cluster.
Salhin further does not expressly disclose the bolded limitations seen below:
selecting a set of representative data records having a generated distance score within a distance threshold;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records…
However, Bayliss addresses this by teaching the following:	each representative data record representing a cluster of similar data records, and each similar data record having a confidence score indicating a confidence level that the similar data record corresponds to the cluster; (Bayliss FIG. 5, col. 19, lines 11-58 describe linked entity references comprising data fields: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other; col. 19, lines 24-34 describes a confidence value assigned to the linkage: a probablistic process whereby a confidence value is assigned to a proposed link between two entity references based on the degree that the field values match optionally adjusted by a weighting factor [the linked references based on matching are interpreted as fulfilling the claimed 'cluster of similar data records']; see the entities being linked to each other in light of FIG. 8, col. 27, lines 31-53 and Table 9)
Bayliss further teaches:
selecting a set of representative data records having a generated distance score within a distance threshold; (Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file. A data field from the subset of data fields included in the entity references is selected (step 706) and the information or value in the selected data field of entity reference A is compared (step 708) to the information or value in the selected data field of entity reference B; col. 27, line 42-col. 28, line 42 and Table 9 describe a graph 800 graphically demonstrating two linked entity references; see then Bayliss col. 37, lines 10-25 showing that this can involve a distance: The "proximity" of entity references to each other may be represented graphically as the distance (i.e., variance) between the entity references at each extreme of each axis representing a selected data field)
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records… (Bayliss FIG. 5, col. 19, lines 10-58: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other [relevant to the claimed 'cluster of similar data records']; see this in light of FIG. 8, col. 27, line 40-col. 28, line 12: Entity references 802-808 represent the entity references of Table 8 having RIDs 1-4, respectively. The arrows between the entity references 802-808 represent the matches determined by performing steps 704-710 in comparing first/middle/last name, SSN, and/or DOB data fields; see steps 704-710 performed in Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file [shows representative data records]; After each of the subset of data fields of the entity reference pair is compared for a match and a record added to the match table 722 when a match occurs, a new entity reference pair may be selected from the master file and steps 706-710 may be repeated for the new entity reference pair) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Bayliss to assign confidence values to similar entity references to improve similar techniques within Salhin to determine similar data records.
In addition, both of the references (Salhin and Bayliss) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Bayliss regarding identifying related entity references and assigning a weight-adjustable confidence value to the techniques of Salhin regarding grouping of textually-matched data records. Motivation to do so would also be that solving many of the limitations affecting known attempts at mining, interpreting and understanding data (as in Bayliss col. 2, line 62-col. 3, line 13) would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.


Regarding claim 19, Salhin teaches:
A non-transitory computer-readable medium comprising instructions stored thereon that are executable by at least one processor to cause a computing device to perform operations comprising: (Salhin ¶ 0073: computer program code which, when executed by a computer, is arranged to perform any method claimed herein. The computer program code may be stored on a non-transitory computer-readable medium; ¶ 0097: a computer, comprising one or more processing devices arranged to, in use, execute computer readable instructions to implement a data record grouping application, wherein the data record grouping application is arranged to perform any method described herein; ¶ 0193: the storage devices and storage media are embodiments of machine-readable storage that are suitable for storing a program or programs that, when executed, implement embodiments of the present invention)
receiving a requested data field of a new data record; (Salhin FIG. 9, ¶ 0154-0157 describes the claimed 'requested data field' through its amount field predictions: assign an amount metric 256 to the group based on the determined degree of similarity; see the amount metric in play in relation to expected amounts 452-456 of FIG. 14 as described in ¶ 0170: For each output data record an expected day/week of future transaction of that type is predicted, which along with the expected amount field, may be used to help cash flow predictions; Salhin teaches this involving a new data record through ¶ 0175: If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
accessing master data comprising a plurality of representative data records, each representative data record representing a cluster of similar data records, (Salhin ¶ 0148-0149: if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”, a clustered textual descriptor may be “business rent” as this applies to all the data records in the cluster; FIG. 10, ¶ 0158 describe outputting a group of temporally-repeating textually matched data records 222 alongside a group descriptor 260: determine a group descriptor 260, and apply the group descriptor 260 to the data record group 222. The group descriptor 260 is a general descriptor for the group of data records applicable to all records in the group 222; see then Bayliss teaching that these records are used as input [and thus address the claimed 'accessing'] in FIG. 16, ¶ 0178: A recorded group contains a plurality of data records 272 which exhibit textually similar descriptions, have a temporally repeating nature (e.g. a frequency of data record repeat); assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group)
and each similar data record having a confidence score indicating a confidence level… (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of steps such as those of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group)
comparing the new data record to each representative data record of the plurality of representative data records using a machine learning model to generate a distance score, for each representative data record, corresponding to a distance between the new data record and each representative data record; (Salhin FIGs. 16-17, ¶ 0178:  A method to determine whether to include a new data record 274 into an existing group as illustrated comprises: assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group; the textual group descriptor based on the textual similarity of data records 272 in the group... determining a textual fitting parameter based on the textual group descriptor... determining if a textual portion of a new data record 274 corresponds to the textual fitting parameter; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0176 shows this involving a machine learning model: new data records may be classified to either join existing data records which have been grouped, or to form a new group. The data group output from the unsupervised machine learning model(s) may be used to train a further machine learning model (e.g. a deep learning model) to comprehend the relationship between new data record fields (amount, descriptions, dates, . . . etc.) and the data recorded in the earlier formed group 222)
selecting a set of … data records having a generated distance score within a distance threshold; (Salhin FIG. 15, ¶ 0173: if the data records in the group of data records have a level of textual similarity below the textual similarity threshold, providing the data records for further similarity analysis; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair ... comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records)
analyzing the cluster of similar data records corresponding to each representative data record in … set of representative data records to generate candidate values for the requested data field of the new data record; (Salhin FIG. 8, ¶ 0152 describes the claimed 'cluster of similar data records': the data record amount identifier module 246 may be arranged to identify an amount value of each data record in the group of textually matched data records 246 from the amount portions of the data records. The module 246 therefore outputs a group of data records having identified amounts 248; FIG. 9, ¶ 0154-0156 describes the claimed 'candidate values' for a request involving the 'amount value' field: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity; ¶ 0175 shows that this can involve the claimed 'new data record': If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
generating a candidate score for each of the candidate values using the distance score for the representative data record to which the candidate value corresponds, (Salhin FIG. 9, ¶ 0154-0156 describe the claimed generating of a score for each of the candidate values: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity [see FIG. 9 referring to Amounts as "same, similar, stepwise, or different," which address the claimed 'scores' for the values], and if all the data records in the group of temporally-repeating textually matched data records 248 have a level of temporal, textual and amount value similarity above an overall similarity threshold; see that this involves the claimed 'distance score' through ¶ 0142-0144 teaching textual similarity being calculated based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0143: an edit distance may be used (an edit distance is a way of quantifying how dissimilar two strings (textual descriptors) [shows relevance to 'representative data record'] are from one another)
and the confidence score for the similar data record to which the candidate value corresponds; and (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group [this means the amount value similarity determination occurs after the confidence level check, fulfilling the claimed 'generating a candidate score' occurring as a result of an initial 'confidence score' determination])
providing a recommendation for the requested data field based on the candidate scores for the candidate values. (Salhin describes "requested data field" through ¶ 0044-0048: The system may further comprise a data record amount identifier module arranged to identify an amount value of each data record in the group of textually matched data records from the amount portions of the data records; determine a degree of similarity of the identified amount values of the textually matched data records in the group; and assign an amount metric to the group based on the determined degree of similarity; FIG. 9, ¶ 0155-0156 are relevant to the claimed 'recommendation'; ¶ 0155 describes recommendations: Another example is of a particular stock item (e.g. powder chemical) which is shipped from a warehouse of a manufacturing plant. A future likely shipment may be predicted, and the amount required predicted, to aid future chemical manufacturing to have the required amount in stock; ¶ 0156 shows this being based on the claimed 'candidate scores': an amount metric 256 may be determined to classify the amount value of each data record in a group to be exact, similar (e.g. the same to within ±5%), stepwise (e.g. increasing by a set amount with time, within a predetermined tolerance), or different (e.g. no clear pattern); see also relevant FIG. 14, ¶ 0170 describing expected amounts 452-456 that are the same, similar, or dis-similar, respectively)
Salhin does not expressly disclose a confidence level that the similar data record corresponds to the cluster.
Salhin further does not expressly disclose the bolded limitations seen below:
selecting a set of representative data records having a generated distance score within a distance threshold;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records…
However, Bayliss addresses this by teaching the following:	each representative data record representing a cluster of similar data records, and each similar data record having a confidence score indicating a confidence level that the similar data record corresponds to the cluster; (Bayliss FIG. 5, col. 19, lines 11-58 describe linked entity references comprising data fields: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other; col. 19, lines 24-34 describes a confidence value assigned to the linkage: a probablistic process whereby a confidence value is assigned to a proposed link between two entity references based on the degree that the field values match optionally adjusted by a weighting factor [the linked references based on matching are interpreted as fulfilling the claimed 'cluster of similar data records']; see the entities being linked to each other in light of FIG. 8, col. 27, lines 31-53 and Table 9)
Bayliss further teaches:
selecting a set of representative data records having a generated distance score within a distance threshold; (Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file. A data field from the subset of data fields included in the entity references is selected (step 706) and the information or value in the selected data field of entity reference A is compared (step 708) to the information or value in the selected data field of entity reference B; col. 27, line 42-col. 28, line 42 and Table 9 describe a graph 800 graphically demonstrating two linked entity references; see then Bayliss col. 37, lines 10-25 showing that this can involve a distance: The "proximity" of entity references to each other may be represented graphically as the distance (i.e., variance) between the entity references at each extreme of each axis representing a selected data field)
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records… (Bayliss FIG. 5, col. 19, lines 10-58: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other [relevant to the claimed 'cluster of similar data records']; see this in light of FIG. 8, col. 27, line 40-col. 28, line 12: Entity references 802-808 represent the entity references of Table 8 having RIDs 1-4, respectively. The arrows between the entity references 802-808 represent the matches determined by performing steps 704-710 in comparing first/middle/last name, SSN, and/or DOB data fields; see steps 704-710 performed in Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file [shows representative data records]; After each of the subset of data fields of the entity reference pair is compared for a match and a record added to the match table 722 when a match occurs, a new entity reference pair may be selected from the master file and steps 706-710 may be repeated for the new entity reference pair) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Bayliss to assign confidence values to similar entity references to improve similar techniques within Salhin to determine similar data records.
In addition, both of the references (Salhin and Bayliss) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Bayliss regarding identifying related entity references and assigning a weight-adjustable confidence value to the techniques of Salhin regarding grouping of textually-matched data records. Motivation to do so would also be that solving many of the limitations affecting known attempts at mining, interpreting and understanding data (as in Bayliss col. 2, line 62-col. 3, line 13) would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.

Regarding claims 2 and 11, Salhin in view of Bayliss teaches all the features with respect to claims 1 and 10 above including:
wherein the requested data field is a supplier name or catalog. (Bayliss FIG. 6, col. 25, lines 60-67: in many instances, only a portion of the data fields of the entity references are significant for matching purposes; for information pertaining to people, the following data fields have been found to be significant: SSN; first/middle/last name; street address, street name and state; vendor supplied IDs (i.e., IDs supplied by the data source). Similar significant data fields are often found in other types of databases)

Regarding claims 3 and 12, Salhin in view of Bayliss teaches all the features with respect to claims 1 and 10 above including:
wherein the master data is generated by performing operations comprising: accessing a data source comprising a plurality of data records; (Bayliss col. 3, lines 14-16(13): data records from one or more data sources; FIG. 2, col. 14, lines 49-67: Process 200 typically initiates at preparation phase 202, wherein incoming data is received from one or more data source and formatted to be compatible with the format of the master file, where the master file represents the database upon which queries may be performed. The incoming data can include data from any of a variety of sources and have any of a variety of heterogeneous formats)
grouping the data records into clusters comprising similar data records; (Bayliss col. 3, lines 53-61: records (i.e., entity references); FIG. 2, col. 14, line 49-col. 15, line 37: During the link phase 204, the proposed entity references generated from the incoming data typically are merged into the master file. During this process, the proposed entity references may be linked to a particular entity using one or more matching techniques discussed in greater detail herein. If a proposed entity reference is associated with a pre-existing entity of the master file, the proposed entity reference may be assigned the DID of the entity. Otherwise, a new entity may be created and assigned a new, unique DID and this DID is assigned to the proposed entity reference)
generating a representative data record to represent each cluster of similar data records; (Bayliss col. 5, lines 57-61, "linking entity references to Definitive Identifiers (DIDs)"; see Bayliss col. 13, lines 37-60: each grouping of entity references typically would be assigned or associated with a unique Definitive Identifier (DID). The DID of a grouping would then be appended to each entity reference in the grouping; each entity reference is assigned a unique Reference Identifier (RID))
generating a confidence score for each similar data record indicating the confidence level that the similar data record corresponds to the cluster; and (Bayliss FIG. 15, col. 46, lines 10-58: At step 1508, the information in the selected data field for entity reference A may be compared to the corresponding information for entity reference B. If there is a match (i.e., equivalence or some acceptable degree of similarity), the score of the corresponding entity pair record in the score table 1522 may be adjusted (e.g., increased) at step 1510)
storing the master data comprising the plurality of representative data records in one or more data stores. (Bayliss describes its master file existing within a database in col. 60, lines 4-20: an exemplary database management system 1800 for processing queries to the master file and/or other databases; Bayliss also teaches this in col. 63, lines 22-53: consider a query to a state's motor vehicle registration database (one example of a master file) resulting in a list of all registered drivers who own a black automobile, sorted by last name; see all this in light of col. 18, lines 55-65: The DID may be associated with the entity reference, for example, by appending the DID as a DID data field to the record of the master file representing the entity reference, by creating a separate file having as records the RIDs of the entity references and their corresponding DIDs, and the like)


Regarding claims 4 and 13, Salhin in view of Bayliss teaches all the features with respect to claims 1 and 10 above including:
wherein the distance threshold is a specified value and the set of representative data records are selected based on each representative record of the set of representative data records having a distance score greater that the specified value for the distance threshold. (Salhin FIGs. 16-17, ¶ 0178 describe group descriptors as relevant to the claimed 'representative record': A method to determine whether to include a new data record 274 into an existing group as illustrated comprises: assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group; the textual group descriptor based on the textual similarity of data records 272 in the group... determining a textual fitting parameter based on the textual group descriptor... determining if a textual portion of a new data record 274 corresponds to the textual fitting parameter; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair ... comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0148: Clusters of textually similar data records 214 may be stored in the textually matched data record log 216 with the data records having a textual similarity above a threshold 208 from the data record textual match analysis module 206)


Regarding claims 5, 14, and 20, Salhin in view of Bayliss teaches all the features with respect to claims 1, 10, and 19 above respectively including:
wherein generating the candidate score for each of the candidate values using the distance score for the representative data record to which the candidate value corresponds and the confidence score for the similar data record to which the candidate value corresponds comprises: for each similar data record, multiplying the distance score by the confidence score to generate a multiplied value; (Bayliss FIG. 17, col. 56, line 53-col. 57, line 37: The match score of each record of the outlier match table 1722 may be multiplied by the confidence level assigned to the match rule that resulted in the creation of the record. Using the previous example, a record having a match score of 80 resulting from the match rule of "gender=male" would be multiplied by a confidence level of 95% to generate a resulting total score of 76; see that the 'match score' of Bayliss is relevant to the claimed 'distance score' in col. 37, lines 1-25: due to their close "proximity" and match characteristics, entity references 1202-1208 may be reasonably linked to entity A; The "proximity" of entity references to each other may be represented graphically as the distance (i.e., variance) between the entity references at each extreme of each axis representing a selected data field)
for each candidate value, combining the multiplied value of each similar data record comprising the candidate value to generate a combined multiplied value; (Bayliss FIG. 17, col. 56, line 53-col. 57, line 38 describe the multiplied values: the additional semantic content of the data is applied at step 1708 by scoring the DIDs represented in the outlier match table 1722 based on the grading criteria. The match score of each record of the outlier match table 1722 may be multiplied by the confidence level assigned to the match rule that resulted in the creation of the record; Bayliss col. 57, lines 39-53 describe the combined multiplied value: At step 1710, the total match score for each entity reference associated with a particular score are, for instance, summed to generate an overall DID match score. The DID/DID match score pair are then added to a DID score table 1726)
determining a count of the number of similar data records in which each candidate value appears; and (Bayliss col. 57, lines 54-67 shows relevance to the claimed 'data records': Table 42 represents an exemplary master file generated from a motor vehicle registration database [this table is used to generate/derive subsequent Tables 43-46]; col. 58, line 20-col. 59, line 47, Tables 42-46 describe counts through its row numbers corresponding to DIDs, see especially Tables 45-46 (where DID #1 has four entries, DID #2 has two entries, etc.))
generating the candidate score for each of the candidate values based on the combined multiplied value and the count for the candidate value. (Bayliss col. 57, lines 39-53 describe scores for each DID: The DID/DID match score pair are then added to a DID score table 1726. At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score; col. 59, lines 1-47: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs. In this example, Table 47 is filtered by taking only the highest scored DID: DID 1 having a total score of 162 (represented by row 1 of Table 44 [should say Table 47]); Bayliss teaches in col. 59 and Tables 46-47 a total score of 162 for DID in Table 47, derived from a summation of Total Scores 100, 30, and 32 [three of the four initial rows corresponding to DID #1] from Table 46, further derived by multiplying 100 by 100%, 60 by 50%, and 80 by 40% respectively)

Regarding claims 6 and 15, Salhin in view of Bayliss teaches all the features with respect to claims 1 and 10 above including:
ranking the candidate values based on the candidate score; and (Bayliss col. 57, line 38 to col. 59, line 47: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs; see Bayliss showing these DIDs corresponding to different entries in Table 42; see Bayliss ranking the DIDs by "Total Score" in Table 47, "162/148/140/120/110")
wherein the recommendation for the requested data field is provided based on the ranked candidate values. (Bayliss col. 57, line 55-57: a law enforcement agency is seeking to identify a person in connection with a crime; col. 59, lines 21-47: At step 1714, the entity references from Table 45 associated with DID 1 may be used to identify the sought-after person. Table 48 illustrates these entity references. From this table, the law enforcement agency can ascertain that the sought person goes by the names "David Hobbson," "D Hoobson" and "David Yates." It also can be determined that this person resides in the zip code 33445 and possibly in the zip code 33555)

Regarding claim 17, Salhin in view of Bayliss teaches all the features with respect to claim 15 above including:
wherein providing the recommendation for the requested data field based on the ranked candidate values comprises providing a subset of the candidate values with the highest candidate score based on a predetermined number of recommendations to be provided. (Bayliss col. 57, lines 41-54 describe a predetermined number: At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score. The threshold may be set at a certain minimum total match score, as a certain top percentage, as a certain number of the highest total match scores [shows the claimed 'number of recommendations'], and the like || Bayliss col. 59, lines 27-47 describe providing output associated with the DID having the highest score [a score of '162' in the example]: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs. In this example, Table 47 is filtered by taking only the highest scored DID: DID 1 having a total score of 162 (represented by row 1 of Table 44 [should say Table 47]). At step 1714, the entity references from Table 45 associated with DID 1 may be used to identify the sought-after person. Table 48 illustrates these entity references)

Regarding claims 8 and 18, Salhin in view of Bayliss teaches all the features with respect to claims 1 and 10 above including:
wherein providing the recommendation for the requested data field based on the candidate scores for the candidate values comprises providing the candidate value with the highest candidate score as the recommendation for the requested data field. (Bayliss col. 57, lines 38-67: At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score; a law enforcement agency is seeking to identify a person in connection with a crime; Table 43 represents an exemplary match table 1730 generated from Table 42 as a result of steps 702-710 of method 700 (FIG. 7) using name matches and zip code matches; see then Bayliss col. 59, lines 27-47 providing output associated with the DID having the highest score [162 in the example]: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs. In this example, Table 47 is filtered by taking only the highest scored DID: DID 1 having a total score of 162 (represented by row 1 of Table 44 [should say Table 47]). At step 1714, the entity references from Table 45 associated with DID 1 may be used to identify the sought-after person. Table 48 illustrates these entity references. From this table, the law enforcement agency can ascertain that the sought person goes by the names "David Hobbson," "D Hoobson" and "David Yates." It also can be determined that this person resides in the zip code 33445 and possibly in the zip code 33555)


Claims 7, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Salhin in view of Bayliss in further view of Balakrishnan et al., U.S. Patent Application Publication No. 2020/0151663 (published May 14, 2020, prior to the instant application date of July 2, 2020; hereinafter Balakrishnan).

Regarding claims 7 and 16, Salhin in view of Bayliss teaches all the features with respect to claims 6 and 15 above respectively including:
wherein ranking the candidate values based on the candidate score comprises ranking the candidate values first based on candidate score, (Bayliss col. 57, lines 39-53: At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score; see this in action in Bayliss Table 47, col. 59, lines 21-47: At step 1710, the total match scores for each of the entity references of a certain DID in Table 46 may be combined to arrive at an overall DID match value for the DID. Table 47 illustrates an exemplary sorted DID score table 1726 resulting from Table 46; Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs)
second based on cluster count indicating the number of clusters in which the candidate value appears, (Bayliss FIG. 15, col. 46, lines 30-50 describe its scoring being performed based on a number of appearances: At step 1508, the information in the selected data field for entity reference A may be compared to the corresponding information for entity reference B. If there is a match (i.e., equivalence or some acceptable degree of similarity), the score of the corresponding entity pair record in the score table 1522 may be adjusted (e.g., increased) at step 1510. For example, if the first name of entity reference A (having, for example, DID 1) matched the first name of entity reference B (having, for example, DID 3) then the score of the DID 3:DID 1 record in the score table 1522 would be adjusted by a determined or predetermined amount. The score adjustment may be constant regardless of the type of match or the magnitude of the score adjustment may be related to the type or degree of match (e.g., matching SSNs may be given a higher match score than matching first names))
Salhin in view of Bayliss does not expressly disclose:
and third based on record count indicating the number of similar records in which the candidate value appears.
However, Balakrishnan teaches this by teaching the following:
ranking the candidate values … third based on record count indicating the number of similar records in which the candidate value appears. (Balakrishnan FIG. 2B, ¶ 0041: Similarity scores are calculated for the historical product data records that satisfy the threshold and the historical product data records are ranked accordingly. Similarity scores between two products can be calculated using a variety of techniques. One approach is to count the number of attributes identical between the two (or more) products. For each identical attribute, the total of number possible values is summed up to represent the similarity score for that attribute. For example, the attribute “material” can have 100 different possible values. Two products having an identical value of “material: Cotton”, will contribute a value of 100 towards the similarity score, to indicate a strong signal of similarity) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Balakrishnan to rank its data records based on calculated similarity scores to improve similar techniques within Salhin as modified by Bayliss to rank its data records based on calculated scores.
In addition, both of the references (Salhin as modified by Bayliss and Balakrishnan) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to apply Balakrishnan regarding similarity scores being based on a variety of techniques including potential diversity in attribute values to the techniques of Salhin as modified by Bayliss regarding match scores to return a desired entity. Motivation to do so would also be that utilizing machine learning techniques to predict and estimate information that is absent from the input data would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.

Regarding claim 9, similar to claim 17, Salhin in view of Bayliss teaches all the features with respect to claim 7 above including:
wherein providing the recommendation for the requested data field based on the ranked candidate values comprises providing a subset of the candidate values with the highest candidate score based on a predetermined number of recommendations to be provided. (Bayliss col. 57, lines 41-54 describe a predetermined number: At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score. The threshold may be set at a certain minimum total match score, as a certain top percentage, as a certain number of the highest total match scores [shows the claimed 'number of recommendations'], and the like || Bayliss col. 59, lines 27-47 describe providing output associated with the DID having the highest score [a score of '162' in the example]: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs. In this example, Table 47 is filtered by taking only the highest scored DID: DID 1 having a total score of 162 (represented by row 1 of Table 44 [should say Table 47]). At step 1714, the entity references from Table 45 associated with DID 1 may be used to identify the sought-after person. Table 48 illustrates these entity references)


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEDIDIAH P FERRER whose telephone number is (571)270-7695. The examiner can normally be reached Monday-Friday 9:00am-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on (571)272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/J.P.F/Examiner, Art Unit 2164                                                                                                                                                                                                        May 18, 2022

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164