DETAILED ACTION
This Office action is in response to Applicant’s reply filed 08/22/2022.
Claims 1-20 are pending. Claims 1-2, 4-11, and 13-20 are amended.
Claims 1-20 are rejected.

Notice of AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Examiner Notes/Objections
FIG. 4 and ¶ 0030 of the instant specification have been amended to correct typographical errors, and any objections are hereby withdrawn.
Claims 4 and 13 have been amended to correctly recite “a distance score greater than the specified value for the distance threshold.” The objections are hereby withdrawn.
Claims 1, 10, and 19 are objected to as they refer to “with do data” and should likely refer to “with no data.” Appropriate correction is required.

Statutory Review under 35 USC § 101
Claims 1-9 are directed towards a method and have been reviewed.
Claims 1-9 appear to be statutory as the method is directed to significantly more than an abstract idea based on currently known judicial exceptions.
Claims 10-18 are directed toward a system and have been reviewed.
Claims 10-18 appear to be statutory under 35 USC § 101, as it includes hardware (“machine-readable medium”) as described in ¶ 0085 of the instant specification.
Claims 10-18 also perform the method of claims 1-9, which is directed to significantly more than an abstract idea based on currently known judicial exceptions.
Claims 19-20 are directed toward an article of manufacture and have been reviewed.
Claims 19-20 initially appear to be statutory, as the article of manufacture excludes transitory signals.
Claims 19-20 perform the method of claims 1 and 5, which is directed to significantly more than an abstract idea based on currently known judicial exceptions.

Claim Interpretation - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

Claims 10-18 are no longer interpreted under 35 U.S.C. 112(f) as they recite a system comprising a machine-readable medium and one or more processors.

Response to Amendments - 35 USC § 101
Claims 1-20 were rejected under 35 U.S.C. 101 but have been amended, and the claimed invention is directed to significantly more than an abstract idea based on currently known judicial exceptions. The 35 U.S.C. 101 rejection of claims 1-10 is hereby withdrawn.


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8; 10-15, 17-18; and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Salhin, U.S. Patent Application Publication No. 2020/0243134 (filed January 24, 2020, prior to the instant application date of July 2, 2020; hereinafter Salhin) in view of Bayliss, U.S. Patent No. 7,720,846 (published May 18, 2010; hereinafter Bayliss) in further view of Marcuccilli et al., U.S. Patent Application Publication No. 2015/0356545 (hereinafter Marcuccilli) in further view of Psota et al., U.S. Patent Application Publication No. 2019/0171655 (published June 6, 2019; hereinafter Psota).

Regarding claim 1, Salhin teaches:
A computer-implemented method comprising: receiving, at a computing system, a new data record... (Salhin FIG. 9, ¶ 0154-0157 describes a new data record through ¶ 0175: If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
accessing, by the computing system, master data comprising a plurality of representative data records, each representative data record representing a cluster of similar data records, (Salhin ¶ 0148-0149: if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”, a clustered textual descriptor may be “business rent” as this applies to all the data records in the cluster; FIG. 10, ¶ 0158 describe outputting a group of temporally-repeating textually matched data records 222 alongside a group descriptor 260: determine a group descriptor 260, and apply the group descriptor 260 to the data record group 222. The group descriptor 260 is a general descriptor for the group of data records applicable to all records in the group 222; see then Bayliss teaching that these records are used as input [and thus address the claimed 'accessing'] in FIG. 16, ¶ 0178: A recorded group contains a plurality of data records 272 which exhibit textually similar descriptions, have a temporally repeating nature (e.g. a frequency of data record repeat); assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group)
and each similar data record having a confidence score indicating a confidence level… (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of steps such as those of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group)
comparing the new data record to each representative data record of the plurality of representative data records using the trained machine learning model to generate a distance score, for each representative data record, corresponding to a distance between the new data record and each representative data record; (Salhin FIGs. 16-17, ¶ 0178:  A method to determine whether to include a new data record 274 into an existing group as illustrated comprises: assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group; the textual group descriptor based on the textual similarity of data records 272 in the group... determining a textual fitting parameter based on the textual group descriptor... determining if a textual portion of a new data record 274 corresponds to the textual fitting parameter; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0148 of Salhin contemplates alternative embodiments where comparison is performed against each individual record: Clusters of textually similar data records 214 may be stored in the textually matched data record log 216 with the data records having a textual similarity above a threshold 208 from the data record textual match analysis module 206; if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”; ¶ 0176-0177 shows this involving a machine learning model: new data records may be classified to either join existing data records which have been grouped, or to form a new group. The data group output from the unsupervised machine learning model(s) may be used to train a further machine learning model (e.g. a deep learning model) to comprehend the relationship between new data record fields (amount, descriptions, dates, . . . etc.) and the data recorded in the earlier formed group 222; the trained model may be used to auto-classify new data records into existing/new groups)
selecting a set of … data records having a generated distance score within a distance threshold of the new data record; (Salhin FIG. 15, ¶ 0173-0175: if the data records in the group of data records have a level of textual similarity below the textual similarity threshold, providing the data records for further similarity analysis; The abovementioned methods of FIG. 1-15 may be used to group data which has not been classified or categorised into group (i.e. has not been processed to determine any temporal repeating patterns or textual similarity with other data records prior to grouping). These methods may be generally considered to be unsupervised machine learning methods. If a new data record is recorded/created after forming the group, this data record may also fit in the group, i.e. it may be textually similar to the other data records in the group; re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair ... comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records)
analyzing the cluster of similar data records corresponding to each representative data record in … set of representative data records to generate candidate values for the data field ... of the new data record... (Salhin FIG. 8, ¶ 0152 describes the claimed 'cluster of similar data records': the data record amount identifier module 246 may be arranged to identify an amount value of each data record in the group of textually matched data records 246 from the amount portions of the data records. The module 246 therefore outputs a group of data records having identified amounts 248; FIG. 9, ¶ 0154-0156 describes the claimed 'candidate values' for a request involving the 'amount value' field: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity; ¶ 0175 shows that this can involve the claimed 'new data record': If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
generating a candidate score for each of the candidate values using the distance score for the representative data record to which the candidate value corresponds, (Salhin FIG. 9, ¶ 0154-0156 describe the claimed generating of a score for each of the candidate values: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity [see FIG. 9 referring to Amounts as "same, similar, stepwise, or different," which address the claimed 'scores' for the values], and if all the data records in the group of temporally-repeating textually matched data records 248 have a level of temporal, textual and amount value similarity above an overall similarity threshold; see that this involves the claimed 'distance score' through ¶ 0142-0144 teaching textual similarity being calculated based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0143: an edit distance may be used (an edit distance is a way of quantifying how dissimilar two strings (textual descriptors) [shows relevance to 'representative data record'] are from one another)
and the confidence score for the similar data record to which the candidate value corresponds; and (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group [this means the amount value similarity determination occurs after the confidence level check, fulfilling the claimed 'generating a candidate score' occurring as a result of an initial 'confidence score' determination])
providing a recommendation for the data field ... based on the candidate scores for the candidate values. (Salhin ¶ 0044-0048: The system may further comprise a data record amount identifier module arranged to identify an amount value of each data record in the group of textually matched data records from the amount portions of the data records; determine a degree of similarity of the identified amount values of the textually matched data records in the group; and assign an amount metric to the group based on the determined degree of similarity; FIG. 9, ¶ 0155-0156 are relevant to the claimed 'recommendation'; ¶ 0155 describes recommendations: Another example is of a particular stock item (e.g. powder chemical) which is shipped from a warehouse of a manufacturing plant. A future likely shipment may be predicted, and the amount required predicted, to aid future chemical manufacturing to have the required amount in stock; ¶ 0156 shows this being based on the claimed 'candidate scores': an amount metric 256 may be determined to classify the amount value of each data record in a group to be exact, similar (e.g. the same to within ±5%), stepwise (e.g. increasing by a set amount with time, within a predetermined tolerance), or different (e.g. no clear pattern); see also relevant FIG. 14, ¶ 0170 describing expected amounts 452-456 that are the same, similar, or dis-similar, respectively)
Salhin does not expressly disclose:
training, by a computing system, a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records on specified fields of the data records;
Salhin does not expressly disclose a confidence level that the similar data record corresponds to the cluster.
Salhin further does not expressly disclose the bolded limitations seen below:
receiving, at a computing system, a new data record comprising a data field with no data;
selecting a set of representative data records having a generated distance score within a distance threshold of the new data record;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records to generate candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value;
Salhin further does not expressly disclose the data field with [n]o data.
However, Bayliss teaches the following:	each representative data record representing a cluster of similar data records, and each similar data record having a confidence score indicating a confidence level that the similar data record corresponds to the cluster; (Bayliss FIG. 5, col. 19, lines 11-58 describe linked entity references comprising data fields: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other; col. 19, lines 24-34 describes a confidence value assigned to the linkage: a probablistic process whereby a confidence value is assigned to a proposed link between two entity references based on the degree that the field values match optionally adjusted by a weighting factor [the linked references based on matching are interpreted as fulfilling the claimed 'cluster of similar data records']; see the entities being linked to each other in light of FIG. 8, col. 27, lines 31-53 and Table 9)
Bayliss further teaches:
selecting a set of representative data records having a generated distance score within a distance threshold... (Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file. A data field from the subset of data fields included in the entity references is selected (step 706) and the information or value in the selected data field of entity reference A is compared (step 708) to the information or value in the selected data field of entity reference B; col. 27, line 42-col. 28, line 42 and Table 9 describe a graph 800 graphically demonstrating two linked entity references; see then Bayliss col. 37, lines 10-25 showing that this can involve a distance: The "proximity" of entity references to each other may be represented graphically as the distance (i.e., variance) between the entity references at each extreme of each axis representing a selected data field)
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records… (Bayliss FIG. 5, col. 19, lines 10-58: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other [relevant to the claimed 'cluster of similar data records']; see this in light of FIG. 8, col. 27, line 40-col. 28, line 12: Entity references 802-808 represent the entity references of Table 8 having RIDs 1-4, respectively. The arrows between the entity references 802-808 represent the matches determined by performing steps 704-710 in comparing first/middle/last name, SSN, and/or DOB data fields; see steps 704-710 performed in Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file [shows representative data records]; After each of the subset of data fields of the entity reference pair is compared for a match and a record added to the match table 722 when a match occurs, a new entity reference pair may be selected from the master file and steps 706-710 may be repeated for the new entity reference pair) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Bayliss to assign confidence values to similar entity references to improve similar techniques within Salhin to determine similar data records.
In addition, both of the references (Salhin and Bayliss) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Bayliss regarding identifying related entity references and assigning a weight-adjustable confidence value to the techniques of Salhin regarding grouping of textually-matched data records. Motivation to do so would also be that solving many of the limitations affecting known attempts at mining, interpreting and understanding data (as in Bayliss col. 2, line 62-col. 3, line 13) would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.
Salhin in view of Bayliss does not expressly disclose:
training, by a computing system, a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records on specified fields of the data records;
Salhin in view of Bayliss further does not expressly disclose the bolded limitations seen below:
receiving, at a computing system, a new data record comprising a data field with no data;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records to generate candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value;
Salhin in view of Bayliss further does not expressly disclose the data field with [n]o data.
However, Marcuccilli teaches the following:
Marcuccilli teaches on specified fields of the data records. (Marcuccilli ¶ 0020: receiving a captured image of a transaction document; identifying text and any machine readable data on the image; extracting data fields from the image; populating a transaction record with transaction data fields from the extracted data fields; comparing and matching the transaction data fields of the transaction record with like data fields of a stored transaction record to determine if the transaction record is a complete transaction record or an incomplete transaction record)
Marcuccilli teaches receiving, at a computing system, a new data record comprising a data field with no data; (Marcuccilli ¶ 0020: receiving a captured image of a transaction document; identifying text and any machine readable data on the image; extracting data fields from the image; populating a transaction record with transaction data fields from the extracted data fields; comparing and matching the transaction data fields of the transaction record with like data fields of a stored transaction record to determine if the transaction record is a complete transaction record or an incomplete transaction record,; determining if the unmatched transaction data field from the incomplete transaction record is empty; FIG. 5, ¶ 0067-0075 describing processing a new transaction document, see particularly ¶ 0073-0075: A payment record is populated from the located data fields; If any payment data fields are empty or missing from the incomplete record)
Marcuccilli teaches generating candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0067-0075 describing processing a new transaction document, see particularly ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data [shows claimed value of corresponding data field]; ¶ 0065 strengthens Marcuccilli addressing a plurality of records: This extracted data is then compared to (validated against) known information in a database that may include prior transaction information from a particular user of the system and/or a database of information from all users (payers and payees) within the system) 
Marcuccilli thus teaches the data field with [n]o data. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Marcuccilli involving deriving information from similar data records to improve similar techniques within Salhin as modified to determine similar data records.
In addition, both of the references (Salhin as modified and Marcuccilli) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as management of similar data entities.
Motivation to do so would be to use the techniques of Marcuccilli involving applying missing information derived from similar data records to a new record with the techniques of Salhin as modified regarding grouping of textually-matched data records. Motivation to do so would also be that allowing correction of invalid, misread, empty, or missing data (as in Marcuccilli ¶ 0074-0076) would have led one of ordinary skill to combine prior art reference teachings regarding similar data records to arrive at the claimed invention.
Salhin in view of Bayliss and Marcuccilli does not expressly disclose:
training, by a computing system, a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records...
However, Psota teaches:
training, by a computing system, a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records... (Psota ¶ 0024 shows the claimed training: techniques for these methods and systems may include deep learning based on principles such as word vectors, semantic hashing, synthetic machine learning training sets and the like; word vectors, semantic hashing and the like may facilitate determining when terms that appear in transaction records indicate a degree of similarity and when those same terms do not; automobile manufacturer names can be close semantically within a group that they represent [shows claimed distance scoring]; an example includes Ford and Chevrolet. These two names may represent a group of U.S. automobile manufacturers, and therefore would suggest some degree of similarity in transaction records that include these terms || Psota FIG. 2, ¶ 0070-0085 describe processing data from a group of objects (e.g., a set of transactions, shipping records) [shows this being based on the claimed 'history sourcing events data records'], see particularly ¶ 0079 and ¶ 0082; ¶ 0079: performing (210) [should say (220)], for each source object in a group of source objects, one or more lookup operations on each target object in the target object index [shows comparisons between a new record and representative data records]; ¶ 0082: converting (230) each source target object pair into a numeric feature vector; the class of features includes one or more of a choice of matching attribute or data value (e.g. company name) from each of the two objects in the source target object pair, a string transformation function T, a string similarity function on the pair of transformed strings S. The overall feature is then a number f defined by f=S(T(source.attribute), T(target.attribute))) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Psota to determine semantic closeness between source target object pairs to improve similar techniques within Salhin as modified involving determining similar data records.
In addition, both of the references (Salhin as modified and Psota) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Psota regarding deep learning and string similarity with the techniques of Salhin as modified regarding grouping of textually-matched data records. Motivation to do so would also be that implementing a system that can process transaction records and extract data from free-form data fields across a plurality of records may facilitate improving classification and the like of such records for platforms that process and utilize this information would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.


Regarding claim 10, Salhin teaches:
A system comprising: a memory that stores instructions, and one or more processors configured by the instructions to perform operations comprising: (Salhin ¶ 0097: a computer, comprising one or more processing devices arranged to, in use, execute computer readable instructions to implement a data record grouping application, wherein the data record grouping application is arranged to perform any method described herein; ¶ 0193: embodiments of the present invention can be realised in the form of hardware, software or a combination of hardware and software. Any such software may be stored in the form of volatile or non-volatile storage such as, for example, a storage device like a ROM, whether erasable or rewritable or not, or in the form of memory; the storage devices and storage media are embodiments of machine-readable storage that are suitable for storing a program or programs that, when executed, implement embodiments of the present invention)
receiving a new data record... (Salhin FIG. 9, ¶ 0154-0157 describes a new data record through ¶ 0175: If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
accessing master data comprising a plurality of representative data records, each representative data record representing a cluster of similar data records, (Salhin ¶ 0148-0149: if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”, a clustered textual descriptor may be “business rent” as this applies to all the data records in the cluster; FIG. 10, ¶ 0158 describe outputting a group of temporally-repeating textually matched data records 222 alongside a group descriptor 260: determine a group descriptor 260, and apply the group descriptor 260 to the data record group 222. The group descriptor 260 is a general descriptor for the group of data records applicable to all records in the group 222; see then Bayliss teaching that these records are used as input [and thus address the claimed 'accessing'] in FIG. 16, ¶ 0178: A recorded group contains a plurality of data records 272 which exhibit textually similar descriptions, have a temporally repeating nature (e.g. a frequency of data record repeat); assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group)
and each similar data record having a confidence score indicating a confidence level… (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of steps such as those of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group)
comparing the new data record to each representative data record of the plurality of representative data records using the trained machine learning model to generate a distance score, for each representative data record, corresponding to a distance between the new data record and each representative data record; (Salhin FIGs. 16-17, ¶ 0178:  A method to determine whether to include a new data record 274 into an existing group as illustrated comprises: assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group; the textual group descriptor based on the textual similarity of data records 272 in the group... determining a textual fitting parameter based on the textual group descriptor... determining if a textual portion of a new data record 274 corresponds to the textual fitting parameter; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0148 of Salhin contemplates alternative embodiments where comparison is performed against each individual record: Clusters of textually similar data records 214 may be stored in the textually matched data record log 216 with the data records having a textual similarity above a threshold 208 from the data record textual match analysis module 206; if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”; ¶ 0176-0177 shows this involving a machine learning model: new data records may be classified to either join existing data records which have been grouped, or to form a new group. The data group output from the unsupervised machine learning model(s) may be used to train a further machine learning model (e.g. a deep learning model) to comprehend the relationship between new data record fields (amount, descriptions, dates, . . . etc.) and the data recorded in the earlier formed group 222; the trained model may be used to auto-classify new data records into existing/new groups)
selecting a set of … data records having a generated distance score within a distance threshold of the new data record; (Salhin FIG. 15, ¶ 0173-0175: if the data records in the group of data records have a level of textual similarity below the textual similarity threshold, providing the data records for further similarity analysis; The abovementioned methods of FIG. 1-15 may be used to group data which has not been classified or categorised into group (i.e. has not been processed to determine any temporal repeating patterns or textual similarity with other data records prior to grouping). These methods may be generally considered to be unsupervised machine learning methods. If a new data record is recorded/created after forming the group, this data record may also fit in the group, i.e. it may be textually similar to the other data records in the group; re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair ... comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records)
analyzing the cluster of similar data records corresponding to each representative data record in … set of representative data records to generate candidate values for the data field ... of the new data record... (Salhin FIG. 8, ¶ 0152 describes the claimed 'cluster of similar data records': the data record amount identifier module 246 may be arranged to identify an amount value of each data record in the group of textually matched data records 246 from the amount portions of the data records. The module 246 therefore outputs a group of data records having identified amounts 248; FIG. 9, ¶ 0154-0156 describes the claimed 'candidate values' for a request involving the 'amount value' field: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity; ¶ 0175 shows that this can involve the claimed 'new data record': If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
generating a candidate score for each of the candidate values using the distance score for the representative data record to which the candidate value corresponds, (Salhin FIG. 9, ¶ 0154-0156 describe the claimed generating of a score for each of the candidate values: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity [see FIG. 9 referring to Amounts as "same, similar, stepwise, or different," which address the claimed 'scores' for the values], and if all the data records in the group of temporally-repeating textually matched data records 248 have a level of temporal, textual and amount value similarity above an overall similarity threshold; see that this involves the claimed 'distance score' through ¶ 0142-0144 teaching textual similarity being calculated based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0143: an edit distance may be used (an edit distance is a way of quantifying how dissimilar two strings (textual descriptors) [shows relevance to 'representative data record'] are from one another)
and the confidence score for the similar data record to which the candidate value corresponds; and (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group [this means the amount value similarity determination occurs after the confidence level check, fulfilling the claimed 'generating a candidate score' occurring as a result of an initial 'confidence score' determination])
providing a recommendation for the data field ... based on the candidate scores for the candidate values. (Salhin describes "requested data field" through ¶ 0044-0048: The system may further comprise a data record amount identifier module arranged to identify an amount value of each data record in the group of textually matched data records from the amount portions of the data records; determine a degree of similarity of the identified amount values of the textually matched data records in the group; and assign an amount metric to the group based on the determined degree of similarity; FIG. 9, ¶ 0155-0156 are relevant to the claimed 'recommendation'; ¶ 0155 describes recommendations: Another example is of a particular stock item (e.g. powder chemical) which is shipped from a warehouse of a manufacturing plant. A future likely shipment may be predicted, and the amount required predicted, to aid future chemical manufacturing to have the required amount in stock; ¶ 0156 shows this being based on the claimed 'candidate scores': an amount metric 256 may be determined to classify the amount value of each data record in a group to be exact, similar (e.g. the same to within ±5%), stepwise (e.g. increasing by a set amount with time, within a predetermined tolerance), or different (e.g. no clear pattern); see also relevant FIG. 14, ¶ 0170 describing expected amounts 452-456 that are the same, similar, or dis-similar, respectively)
Salhin does not expressly disclose:
training a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records on specified fields of the data records;
Salhin does not expressly disclose a confidence level that the similar data record corresponds to the cluster.
Salhin further does not expressly disclose the bolded limitations seen below:
receiving a new data record comprising a data field with no data;
selecting a set of representative data records having a generated distance score within a distance threshold of the new data record;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records to generate candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value;
Salhin further does not expressly disclose the data field with [n]o data.
However, Bayliss teaches the following:	each representative data record representing a cluster of similar data records, and each similar data record having a confidence score indicating a confidence level that the similar data record corresponds to the cluster; (Bayliss FIG. 5, col. 19, lines 11-58 describe linked entity references comprising data fields: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other; col. 19, lines 24-34 describes a confidence value assigned to the linkage: a probablistic process whereby a confidence value is assigned to a proposed link between two entity references based on the degree that the field values match optionally adjusted by a weighting factor [the linked references based on matching are interpreted as fulfilling the claimed 'cluster of similar data records']; see the entities being linked to each other in light of FIG. 8, col. 27, lines 31-53 and Table 9)
Bayliss further teaches:
selecting a set of representative data records having a generated distance score within a distance threshold... (Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file. A data field from the subset of data fields included in the entity references is selected (step 706) and the information or value in the selected data field of entity reference A is compared (step 708) to the information or value in the selected data field of entity reference B; col. 27, line 42-col. 28, line 42 and Table 9 describe a graph 800 graphically demonstrating two linked entity references; see then Bayliss col. 37, lines 10-25 showing that this can involve a distance: The "proximity" of entity references to each other may be represented graphically as the distance (i.e., variance) between the entity references at each extreme of each axis representing a selected data field)
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records… (Bayliss FIG. 5, col. 19, lines 10-58: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other [relevant to the claimed 'cluster of similar data records']; see this in light of FIG. 8, col. 27, line 40-col. 28, line 12: Entity references 802-808 represent the entity references of Table 8 having RIDs 1-4, respectively. The arrows between the entity references 802-808 represent the matches determined by performing steps 704-710 in comparing first/middle/last name, SSN, and/or DOB data fields; see steps 704-710 performed in Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file [shows representative data records]; After each of the subset of data fields of the entity reference pair is compared for a match and a record added to the match table 722 when a match occurs, a new entity reference pair may be selected from the master file and steps 706-710 may be repeated for the new entity reference pair) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Bayliss to assign confidence values to similar entity references to improve similar techniques within Salhin to determine similar data records.
In addition, both of the references (Salhin and Bayliss) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Bayliss regarding identifying related entity references and assigning a weight-adjustable confidence value to the techniques of Salhin regarding grouping of textually-matched data records. Motivation to do so would also be that solving many of the limitations affecting known attempts at mining, interpreting and understanding data (as in Bayliss col. 2, line 62-col. 3, line 13) would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.
Salhin in view of Bayliss does not expressly disclose:
training a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records on specified fields of the data records;
Salhin in view of Bayliss further does not expressly disclose the bolded limitations seen below:
receiving a new data record comprising a data field with no data;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records to generate candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value;
Salhin in view of Bayliss further does not expressly disclose the data field with [n]o data.
However, Marcuccilli teaches the following:
Marcuccilli teaches on specified fields of the data records. (Marcuccilli ¶ 0020: receiving a captured image of a transaction document; identifying text and any machine readable data on the image; extracting data fields from the image; populating a transaction record with transaction data fields from the extracted data fields; comparing and matching the transaction data fields of the transaction record with like data fields of a stored transaction record to determine if the transaction record is a complete transaction record or an incomplete transaction record)
Marcuccilli teaches receiving a new data record comprising a data field with no data; (Marcuccilli ¶ 0020: receiving a captured image of a transaction document; identifying text and any machine readable data on the image; extracting data fields from the image; populating a transaction record with transaction data fields from the extracted data fields; comparing and matching the transaction data fields of the transaction record with like data fields of a stored transaction record to determine if the transaction record is a complete transaction record or an incomplete transaction record,; determining if the unmatched transaction data field from the incomplete transaction record is empty; FIG. 5, ¶ 0067-0075 describing processing a new transaction document, see particularly ¶ 0073-0075: A payment record is populated from the located data fields; If any payment data fields are empty or missing from the incomplete record)
Marcuccilli teaches generating candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0067-0075 describing processing a new transaction document, see particularly ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data [shows claimed value of corresponding data field]; ¶ 0065 strengthens Marcuccilli addressing a plurality of records: This extracted data is then compared to (validated against) known information in a database that may include prior transaction information from a particular user of the system and/or a database of information from all users (payers and payees) within the system) 
Marcuccilli thus teaches the data field with [n]o data. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Marcuccilli involving deriving information from similar data records to improve similar techniques within Salhin as modified to determine similar data records.
In addition, both of the references (Salhin as modified and Marcuccilli) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as management of similar data entities.
Motivation to do so would be to use the techniques of Marcuccilli involving applying missing information derived from similar data records to a new record with the techniques of Salhin as modified regarding grouping of textually-matched data records. Motivation to do so would also be that allowing correction of invalid, misread, empty, or missing data (as in Marcuccilli ¶ 0074-0076) would have led one of ordinary skill to combine prior art reference teachings regarding similar data records to arrive at the claimed invention.
Salhin in view of Bayliss and Marcuccilli does not expressly disclose:
training a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records...
However, Psota teaches:
training a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records... (Psota ¶ 0024 shows the claimed training: techniques for these methods and systems may include deep learning based on principles such as word vectors, semantic hashing, synthetic machine learning training sets and the like; word vectors, semantic hashing and the like may facilitate determining when terms that appear in transaction records indicate a degree of similarity and when those same terms do not; automobile manufacturer names can be close semantically within a group that they represent [shows claimed distance scoring]; an example includes Ford and Chevrolet. These two names may represent a group of U.S. automobile manufacturers, and therefore would suggest some degree of similarity in transaction records that include these terms || Psota FIG. 2, ¶ 0070-0085 describe processing data from a group of objects (e.g., a set of transactions, shipping records) [shows this being based on the claimed 'history sourcing events data records'], see particularly ¶ 0079 and ¶ 0082; ¶ 0079: performing (210) [should say (220)], for each source object in a group of source objects, one or more lookup operations on each target object in the target object index [shows comparisons between a new record and representative data records]; ¶ 0082: converting (230) each source target object pair into a numeric feature vector; the class of features includes one or more of a choice of matching attribute or data value (e.g. company name) from each of the two objects in the source target object pair, a string transformation function T, a string similarity function on the pair of transformed strings S. The overall feature is then a number f defined by f=S(T(source.attribute), T(target.attribute))) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Psota to determine semantic closeness between source target object pairs to improve similar techniques within Salhin as modified involving determining similar data records.
In addition, both of the references (Salhin as modified and Psota) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Psota regarding deep learning and string similarity with the techniques of Salhin as modified regarding grouping of textually-matched data records. Motivation to do so would also be that implementing a system that can process transaction records and extract data from free-form data fields across a plurality of records may facilitate improving classification and the like of such records for platforms that process and utilize this information would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.

Regarding claim 19, Salhin teaches:
A non-transitory computer-readable medium comprising instructions stored thereon that are executable by at least one processor to cause a computing device to perform operations comprising: (Salhin ¶ 0073: computer program code which, when executed by a computer, is arranged to perform any method claimed herein. The computer program code may be stored on a non-transitory computer-readable medium; ¶ 0097: a computer, comprising one or more processing devices arranged to, in use, execute computer readable instructions to implement a data record grouping application, wherein the data record grouping application is arranged to perform any method described herein; ¶ 0193: the storage devices and storage media are embodiments of machine-readable storage that are suitable for storing a program or programs that, when executed, implement embodiments of the present invention)
receiving a new data record... (Salhin FIG. 9, ¶ 0154-0157 describes a new data record through ¶ 0175: If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
accessing master data comprising a plurality of representative data records, each representative data record representing a cluster of similar data records, (Salhin ¶ 0148-0149: if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”, a clustered textual descriptor may be “business rent” as this applies to all the data records in the cluster; FIG. 10, ¶ 0158 describe outputting a group of temporally-repeating textually matched data records 222 alongside a group descriptor 260: determine a group descriptor 260, and apply the group descriptor 260 to the data record group 222. The group descriptor 260 is a general descriptor for the group of data records applicable to all records in the group 222; see then Bayliss teaching that these records are used as input [and thus address the claimed 'accessing'] in FIG. 16, ¶ 0178: A recorded group contains a plurality of data records 272 which exhibit textually similar descriptions, have a temporally repeating nature (e.g. a frequency of data record repeat); assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group)
and each similar data record having a confidence score indicating a confidence level… (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of steps such as those of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group)
comparing the new data record to each representative data record of the plurality of representative data records using the trained machine learning model to generate a distance score, for each representative data record, corresponding to a distance between the new data record and each representative data record; (Salhin FIGs. 16-17, ¶ 0178:  A method to determine whether to include a new data record 274 into an existing group as illustrated comprises: assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group; the textual group descriptor based on the textual similarity of data records 272 in the group... determining a textual fitting parameter based on the textual group descriptor... determining if a textual portion of a new data record 274 corresponds to the textual fitting parameter; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0148 of Salhin contemplates alternative embodiments where comparison is performed against each individual record: Clusters of textually similar data records 214 may be stored in the textually matched data record log 216 with the data records having a textual similarity above a threshold 208 from the data record textual match analysis module 206; if the individual data records in a cluster have textual descriptors “rent”, “rental payments”, “business rental” and “business premises rental”; ¶ 0176-0177 shows this involving a machine learning model: new data records may be classified to either join existing data records which have been grouped, or to form a new group. The data group output from the unsupervised machine learning model(s) may be used to train a further machine learning model (e.g. a deep learning model) to comprehend the relationship between new data record fields (amount, descriptions, dates, . . . etc.) and the data recorded in the earlier formed group 222; the trained model may be used to auto-classify new data records into existing/new groups)
selecting a set of … data records having a generated distance score within a distance threshold of the new data record; (Salhin FIG. 15, ¶ 0173-0175: if the data records in the group of data records have a level of textual similarity below the textual similarity threshold, providing the data records for further similarity analysis; The abovementioned methods of FIG. 1-15 may be used to group data which has not been classified or categorised into group (i.e. has not been processed to determine any temporal repeating patterns or textual similarity with other data records prior to grouping). These methods may be generally considered to be unsupervised machine learning methods. If a new data record is recorded/created after forming the group, this data record may also fit in the group, i.e. it may be textually similar to the other data records in the group; re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair ... comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records)
analyzing the cluster of similar data records corresponding to each representative data record in … set of representative data records to generate candidate values for the data field ... of the new data record... (Salhin FIG. 8, ¶ 0152 describes the claimed 'cluster of similar data records': the data record amount identifier module 246 may be arranged to identify an amount value of each data record in the group of textually matched data records 246 from the amount portions of the data records. The module 246 therefore outputs a group of data records having identified amounts 248; FIG. 9, ¶ 0154-0156 describes the claimed 'candidate values' for a request involving the 'amount value' field: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity; ¶ 0175 shows that this can involve the claimed 'new data record': If a new data record is recorded/created after forming the group, this data record may also fit in the group; One way to capture this new data record in the existing group is to re-run the above described method again for the full data set (the data records in the group and the new data record) to capture the new group of data records, including the new record)
generating a candidate score for each of the candidate values using the distance score for the representative data record to which the candidate value corresponds, (Salhin FIG. 9, ¶ 0154-0156 describe the claimed generating of a score for each of the candidate values: the data record amount similarity module 254 may be arranged to analyse the temporally-repeating textually matched data records 248 in the group for amount value similarity [see FIG. 9 referring to Amounts as "same, similar, stepwise, or different," which address the claimed 'scores' for the values], and if all the data records in the group of temporally-repeating textually matched data records 248 have a level of temporal, textual and amount value similarity above an overall similarity threshold; see that this involves the claimed 'distance score' through ¶ 0142-0144 teaching textual similarity being calculated based on a distance: the data record pair matching module is arranged to identify the textual similarity metric of the data records of each pair of data records by applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair. The one or more similarity metric methods comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0143: an edit distance may be used (an edit distance is a way of quantifying how dissimilar two strings (textual descriptors) [shows relevance to 'representative data record'] are from one another)
and the confidence score for the similar data record to which the candidate value corresponds; and (Salhin FIG. 3, ¶ 0127-0128 describe the claimed 'confidence score': the quality of the identified recurrent/repeated data records may be determined, for example as a confidence level check of a future data record prediction; recurrent data records are grouped and provided as matched data records 108; see that these records are input at the beginning of FIG. 9, ¶ 0154-0155: The data records with amounts identified 248 (which may be temporally repeating [thus occurring after the 'confidence' level check of Salhin] and/or textually matched) are provided to the data record amount similarity module 254 to determine a similarity of the amounts of the data records in the group [this means the amount value similarity determination occurs after the confidence level check, fulfilling the claimed 'generating a candidate score' occurring as a result of an initial 'confidence score' determination])
providing a recommendation for the requested data field based on the candidate scores for the candidate values. (Salhin describes "requested data field" through ¶ 0044-0048: The system may further comprise a data record amount identifier module arranged to identify an amount value of each data record in the group of textually matched data records from the amount portions of the data records; determine a degree of similarity of the identified amount values of the textually matched data records in the group; and assign an amount metric to the group based on the determined degree of similarity; FIG. 9, ¶ 0155-0156 are relevant to the claimed 'recommendation'; ¶ 0155 describes recommendations: Another example is of a particular stock item (e.g. powder chemical) which is shipped from a warehouse of a manufacturing plant. A future likely shipment may be predicted, and the amount required predicted, to aid future chemical manufacturing to have the required amount in stock; ¶ 0156 shows this being based on the claimed 'candidate scores': an amount metric 256 may be determined to classify the amount value of each data record in a group to be exact, similar (e.g. the same to within ±5%), stepwise (e.g. increasing by a set amount with time, within a predetermined tolerance), or different (e.g. no clear pattern); see also relevant FIG. 14, ¶ 0170 describing expected amounts 452-456 that are the same, similar, or dis-similar, respectively) 
Salhin does not expressly disclose:
training a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records on specified fields of the data records;
Salhin does not expressly disclose a confidence level that the similar data record corresponds to the cluster.
Salhin further does not expressly disclose the bolded limitations seen below:
receiving a new data record comprising a data field with no data;
selecting a set of representative data records having a generated distance score within a distance threshold of the new data record;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records to generate candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value;
Salhin further does not expressly disclose the data field with [n]o data.
However, Bayliss teaches the following:	each representative data record representing a cluster of similar data records, and each similar data record having a confidence score indicating a confidence level that the similar data record corresponds to the cluster; (Bayliss FIG. 5, col. 19, lines 11-58 describe linked entity references comprising data fields: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other; col. 19, lines 24-34 describes a confidence value assigned to the linkage: a probablistic process whereby a confidence value is assigned to a proposed link between two entity references based on the degree that the field values match optionally adjusted by a weighting factor [the linked references based on matching are interpreted as fulfilling the claimed 'cluster of similar data records']; see the entities being linked to each other in light of FIG. 8, col. 27, lines 31-53 and Table 9)
Bayliss further teaches:
selecting a set of representative data records having a generated distance score within a distance threshold... (Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file. A data field from the subset of data fields included in the entity references is selected (step 706) and the information or value in the selected data field of entity reference A is compared (step 708) to the information or value in the selected data field of entity reference B; col. 27, line 42-col. 28, line 42 and Table 9 describe a graph 800 graphically demonstrating two linked entity references; see then Bayliss col. 37, lines 10-25 showing that this can involve a distance: The "proximity" of entity references to each other may be represented graphically as the distance (i.e., variance) between the entity references at each extreme of each axis representing a selected data field)
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records… (Bayliss FIG. 5, col. 19, lines 10-58: comparing the value(s) of a subset of the data fields (selected during step 402, FIG. 4) of one entity reference are compared to the value(s) of another entity reference to determine the degree of similarity between the entity references. If the degree of similarity exceeds a certain threshold, the entity reference may be identified as related to the same entity and, therefore, linked to each other [relevant to the claimed 'cluster of similar data records']; see this in light of FIG. 8, col. 27, line 40-col. 28, line 12: Entity references 802-808 represent the entity references of Table 8 having RIDs 1-4, respectively. The arrows between the entity references 802-808 represent the matches determined by performing steps 704-710 in comparing first/middle/last name, SSN, and/or DOB data fields; see steps 704-710 performed in Bayliss FIG. 7, col. 26, lines 34-47: At step 704, an entity reference pair (entity reference A and entity reference B) are selected from the master file [shows representative data records]; After each of the subset of data fields of the entity reference pair is compared for a match and a record added to the match table 722 when a match occurs, a new entity reference pair may be selected from the master file and steps 706-710 may be repeated for the new entity reference pair) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Bayliss to assign confidence values to similar entity references to improve similar techniques within Salhin to determine similar data records.
In addition, both of the references (Salhin and Bayliss) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Bayliss regarding identifying related entity references and assigning a weight-adjustable confidence value to the techniques of Salhin regarding grouping of textually-matched data records. Motivation to do so would also be that solving many of the limitations affecting known attempts at mining, interpreting and understanding data (as in Bayliss col. 2, line 62-col. 3, line 13) would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.
Salhin in view of Bayliss does not expressly disclose:
training a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records on specified fields of the data records;
Salhin in view of Bayliss further does not expressly disclose the bolded limitations seen below:
receiving a new data record comprising a data field with no data;
analyzing the cluster of similar data records corresponding to each representative data record in the set of representative data records to generate candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value;
Salhin in view of Bayliss further does not expressly disclose the data field with [n]o data.
However, Marcuccilli teaches the following:
Marcuccilli teaches on specified fields of the data records. (Marcuccilli ¶ 0020: receiving a captured image of a transaction document; identifying text and any machine readable data on the image; extracting data fields from the image; populating a transaction record with transaction data fields from the extracted data fields; comparing and matching the transaction data fields of the transaction record with like data fields of a stored transaction record to determine if the transaction record is a complete transaction record or an incomplete transaction record)
Marcuccilli teaches receiving a new data record comprising a data field with no data; (Marcuccilli ¶ 0020: receiving a captured image of a transaction document; identifying text and any machine readable data on the image; extracting data fields from the image; populating a transaction record with transaction data fields from the extracted data fields; comparing and matching the transaction data fields of the transaction record with like data fields of a stored transaction record to determine if the transaction record is a complete transaction record or an incomplete transaction record,; determining if the unmatched transaction data field from the incomplete transaction record is empty; FIG. 5, ¶ 0067-0075 describing processing a new transaction document, see particularly ¶ 0073-0075: A payment record is populated from the located data fields; If any payment data fields are empty or missing from the incomplete record)
Marcuccilli teaches generating candidate values for the data field with no data of the new data record by determining a data field in each record corresponding to the data field with no data and selecting a value of the data field in each record as a candidate value. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0067-0075 describing processing a new transaction document, see particularly ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data [shows claimed value of corresponding data field]; ¶ 0065 strengthens Marcuccilli addressing a plurality of records: This extracted data is then compared to (validated against) known information in a database that may include prior transaction information from a particular user of the system and/or a database of information from all users (payers and payees) within the system) 
Marcuccilli thus teaches the data field with [n]o data. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Marcuccilli involving deriving information from similar data records to improve similar techniques within Salhin as modified to determine similar data records.
In addition, both of the references (Salhin as modified and Marcuccilli) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as management of similar data entities.
Motivation to do so would be to use the techniques of Marcuccilli involving applying missing information derived from similar data records to a new record with the techniques of Salhin as modified regarding grouping of textually-matched data records. Motivation to do so would also be that allowing correction of invalid, misread, empty, or missing data (as in Marcuccilli ¶ 0074-0076) would have led one of ordinary skill to combine prior art reference teachings regarding similar data records to arrive at the claimed invention.
Salhin in view of Bayliss and Marcuccilli does not expressly disclose:
training a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records...
However, Psota teaches:
training a machine learning model to generate a trained machine learning model configured to generate a distance score between a new data record and each of a plurality of representative data records, the training using training data based on a history sourcing events data records... (Psota ¶ 0024 shows the claimed training: techniques for these methods and systems may include deep learning based on principles such as word vectors, semantic hashing, synthetic machine learning training sets and the like; word vectors, semantic hashing and the like may facilitate determining when terms that appear in transaction records indicate a degree of similarity and when those same terms do not; automobile manufacturer names can be close semantically within a group that they represent [shows claimed distance scoring]; an example includes Ford and Chevrolet. These two names may represent a group of U.S. automobile manufacturers, and therefore would suggest some degree of similarity in transaction records that include these terms || Psota FIG. 2, ¶ 0070-0085 describe processing data from a group of objects (e.g., a set of transactions, shipping records) [shows this being based on the claimed 'history sourcing events data records'], see particularly ¶ 0079 and ¶ 0082; ¶ 0079: performing (210) [should say (220)], for each source object in a group of source objects, one or more lookup operations on each target object in the target object index [shows comparisons between a new record and representative data records]; ¶ 0082: converting (230) each source target object pair into a numeric feature vector; the class of features includes one or more of a choice of matching attribute or data value (e.g. company name) from each of the two objects in the source target object pair, a string transformation function T, a string similarity function on the pair of transformed strings S. The overall feature is then a number f defined by f=S(T(source.attribute), T(target.attribute))) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Psota to determine semantic closeness between source target object pairs to improve similar techniques within Salhin as modified involving determining similar data records.
In addition, both of the references (Salhin as modified and Psota) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to use the techniques of Psota regarding deep learning and string similarity with the techniques of Salhin as modified regarding grouping of textually-matched data records. Motivation to do so would also be that implementing a system that can process transaction records and extract data from free-form data fields across a plurality of records may facilitate improving classification and the like of such records for platforms that process and utilize this information would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.

Regarding claims 2 and 11, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claims 1 and 10 above including:
wherein the data field ... is a supplier name or catalog data field. (Bayliss FIG. 6, col. 25, lines 60-67: in many instances, only a portion of the data fields of the entity references are significant for matching purposes; for information pertaining to people, the following data fields have been found to be significant: SSN; first/middle/last name; street address, street name and state; vendor supplied IDs (i.e., IDs supplied by the data source). Similar significant data fields are often found in other types of databases)
Marcuccilli teaches the data field with no data. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)


Regarding claims 3 and 12, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claims 1 and 10 above including:
wherein the master data is generated by performing operations comprising: accessing a data source comprising a plurality of data records; (Bayliss col. 3, lines 14-16(13): data records from one or more data sources; FIG. 2, col. 14, lines 49-67: Process 200 typically initiates at preparation phase 202, wherein incoming data is received from one or more data source and formatted to be compatible with the format of the master file, where the master file represents the database upon which queries may be performed. The incoming data can include data from any of a variety of sources and have any of a variety of heterogeneous formats)
grouping the data records into clusters comprising similar data records; (Bayliss col. 3, lines 53-61: records (i.e., entity references); FIG. 2, col. 14, line 49-col. 15, line 37: During the link phase 204, the proposed entity references generated from the incoming data typically are merged into the master file. During this process, the proposed entity references may be linked to a particular entity using one or more matching techniques discussed in greater detail herein. If a proposed entity reference is associated with a pre-existing entity of the master file, the proposed entity reference may be assigned the DID of the entity. Otherwise, a new entity may be created and assigned a new, unique DID and this DID is assigned to the proposed entity reference)
generating a representative data record to represent each cluster of similar data records; (Bayliss col. 5, lines 57-61, "linking entity references to Definitive Identifiers (DIDs)"; see Bayliss col. 13, lines 37-60: each grouping of entity references typically would be assigned or associated with a unique Definitive Identifier (DID). The DID of a grouping would then be appended to each entity reference in the grouping; each entity reference is assigned a unique Reference Identifier (RID))
generating a confidence score for each similar data record indicating the confidence level that the similar data record corresponds to the cluster; and (Bayliss FIG. 15, col. 46, lines 10-58: At step 1508, the information in the selected data field for entity reference A may be compared to the corresponding information for entity reference B. If there is a match (i.e., equivalence or some acceptable degree of similarity), the score of the corresponding entity pair record in the score table 1522 may be adjusted (e.g., increased) at step 1510)
storing the master data comprising the plurality of representative data records in one or more data stores. (Bayliss describes its master file existing within a database in col. 60, lines 4-20: an exemplary database management system 1800 for processing queries to the master file and/or other databases; Bayliss also teaches this in col. 63, lines 22-53: consider a query to a state's motor vehicle registration database (one example of a master file) resulting in a list of all registered drivers who own a black automobile, sorted by last name; see all this in light of col. 18, lines 55-65: The DID may be associated with the entity reference, for example, by appending the DID as a DID data field to the record of the master file representing the entity reference, by creating a separate file having as records the RIDs of the entity references and their corresponding DIDs, and the like)

Regarding claims 4 and 13, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claims 1 and 10 above including:
wherein the distance threshold is a specified value and the set of representative data records are selected based on each representative record of the set of representative data records having a distance score greater than the specified value for the distance threshold. (Salhin FIGs. 16-17, ¶ 0178 describe group descriptors as relevant to the claimed 'representative record': A method to determine whether to include a new data record 274 into an existing group as illustrated comprises: assigning a textual group descriptor and a temporal group descriptor to all data records 272 in a group; the textual group descriptor based on the textual similarity of data records 272 in the group... determining a textual fitting parameter based on the textual group descriptor... determining if a textual portion of a new data record 274 corresponds to the textual fitting parameter; FIG. 5, ¶ 0141-0144 describe how this textual similarity is based on a distance: applying one or more similarity metric methods to each pair of data records to obtain a textual similarity metric for each pair ... comprising one or more of: an edit distance for each pair of data records; a cosine distance based on a word count of the data records; and a cosine distance based on a word2vec analysis of the data records; ¶ 0148: Clusters of textually similar data records 214 may be stored in the textually matched data record log 216 with the data records having a textual similarity above a threshold 208 from the data record textual match analysis module 206)

Regarding claims 5, 14, and 20, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claims 1, 10, and 19 above respectively including:
wherein generating the candidate score for each of the candidate values ... using the distance score for the representative data record to which the candidate value corresponds and the confidence score for the similar data record to which the candidate value corresponds comprises: for each similar data record, multiplying the distance score by the confidence score to generate a multiplied value; (Bayliss FIG. 17, col. 56, line 53-col. 57, line 37: The match score of each record of the outlier match table 1722 may be multiplied by the confidence level assigned to the match rule that resulted in the creation of the record. Using the previous example, a record having a match score of 80 resulting from the match rule of "gender=male" would be multiplied by a confidence level of 95% to generate a resulting total score of 76; see that the 'match score' of Bayliss is relevant to the claimed 'distance score' in col. 37, lines 1-25: due to their close "proximity" and match characteristics, entity references 1202-1208 may be reasonably linked to entity A; The "proximity" of entity references to each other may be represented graphically as the distance (i.e., variance) between the entity references at each extreme of each axis representing a selected data field)
Marcuccilli teaches candidate values for the data field with no data. (Marcuccilli ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address, for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)
for each candidate value, combining the multiplied value of each similar data record comprising the candidate value to generate a combined multiplied value; (Bayliss FIG. 17, col. 56, line 53-col. 57, line 38 describe the multiplied values: the additional semantic content of the data is applied at step 1708 by scoring the DIDs represented in the outlier match table 1722 based on the grading criteria. The match score of each record of the outlier match table 1722 may be multiplied by the confidence level assigned to the match rule that resulted in the creation of the record; Bayliss col. 57, lines 39-53 describe the combined multiplied value: At step 1710, the total match score for each entity reference associated with a particular score are, for instance, summed to generate an overall DID match score. The DID/DID match score pair are then added to a DID score table 1726)
determining a count of a number of similar data records in which each candidate value appears; and (Bayliss col. 57, lines 54-67 shows relevance to the claimed 'data records': Table 42 represents an exemplary master file generated from a motor vehicle registration database [this table is used to generate/derive subsequent Tables 43-46]; col. 58, line 20-col. 59, line 47, Tables 42-46 describe counts through its row numbers corresponding to DIDs, see especially Tables 45-46 (where DID #1 has four entries, DID #2 has two entries, etc.))
generating the candidate score for each of the candidate values based on the combined multiplied value and the count for the candidate value. (Bayliss col. 57, lines 39-53 describe scores for each DID: The DID/DID match score pair are then added to a DID score table 1726. At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score; col. 59, lines 1-47: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs. In this example, Table 47 is filtered by taking only the highest scored DID: DID 1 having a total score of 162 (represented by row 1 of Table 44 [should say Table 47]); Bayliss teaches in col. 59 and Tables 46-47 a total score of 162 for DID in Table 47, derived from a summation of Total Scores 100, 30, and 32 [three of the four initial rows corresponding to DID #1] from Table 46, further derived by multiplying 100 by 100%, 60 by 50%, and 80 by 40% respectively)

Regarding claims 6 and 15, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claims 1 and 10 above including:
ranking the candidate values based on the candidate score; and (Bayliss col. 57, line 38 to col. 59, line 47: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs; see Bayliss showing these DIDs corresponding to different entries in Table 42; see Bayliss ranking the DIDs by "Total Score" in Table 47, "162/148/140/120/110")
wherein the recommendation for the data field ... is provided based on the ranked candidate values. (Bayliss col. 57, line 55-57: a law enforcement agency is seeking to identify a person in connection with a crime; col. 59, lines 21-47: At step 1714, the entity references from Table 45 associated with DID 1 may be used to identify the sought-after person. Table 48 illustrates these entity references. From this table, the law enforcement agency can ascertain that the sought person goes by the names "David Hobbson," "D Hoobson" and "David Yates." It also can be determined that this person resides in the zip code 33445 and possibly in the zip code 33555)
Marcuccilli teaches the data field with no data. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)

Regarding claim 17, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claim 15 above including:
wherein providing the recommendation for the data field ... based on the ranked candidate values comprises providing a subset of the candidate values with a highest candidate score based on a predetermined number of recommendations to be provided. (Bayliss col. 57, lines 41-54 describe a predetermined number: At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score. The threshold may be set at a certain minimum total match score, as a certain top percentage, as a certain number of the highest total match scores [shows the claimed 'number of recommendations'], and the like || Bayliss col. 59, lines 27-47 describe providing output associated with the DID having the highest score [a score of '162' in the example]: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs. In this example, Table 47 is filtered by taking only the highest scored DID: DID 1 having a total score of 162 (represented by row 1 of Table 44 [should say Table 47]). At step 1714, the entity references from Table 45 associated with DID 1 may be used to identify the sought-after person. Table 48 illustrates these entity references)
Marcuccilli teaches the data field with no data. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)

Regarding claims 8 and 18, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claims 1 and 10 above including:
wherein providing the recommendation for the data field ... based on the candidate scores for the candidate values comprises providing the candidate value with a highest candidate score as the recommendation for the data field... (Bayliss col. 57, lines 38-67: At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score; a law enforcement agency is seeking to identify a person in connection with a crime; Table 43 represents an exemplary match table 1730 generated from Table 42 as a result of steps 702-710 of method 700 (FIG. 7) using name matches and zip code matches; see then Bayliss col. 59, lines 27-47 providing output associated with the DID having the highest score [162 in the example]: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs. In this example, Table 47 is filtered by taking only the highest scored DID: DID 1 having a total score of 162 (represented by row 1 of Table 44 [should say Table 47]). At step 1714, the entity references from Table 45 associated with DID 1 may be used to identify the sought-after person. Table 48 illustrates these entity references. From this table, the law enforcement agency can ascertain that the sought person goes by the names "David Hobbson," "D Hoobson" and "David Yates." It also can be determined that this person resides in the zip code 33445 and possibly in the zip code 33555) 
Marcuccilli teaches the data field with no data. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)

Claims 7, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Salhin in view of Bayliss and Marcuccilli and Psota in further view of Balakrishnan et al., U.S. Patent Application Publication No. 2020/0151663 (published May 14, 2020, prior to the instant application date of July 2, 2020; hereinafter Balakrishnan).

Regarding claims 7 and 16, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claims 6 and 15 above respectively including:
wherein ranking the candidate values based on the candidate score comprises ranking the candidate values first based on candidate score, (Bayliss col. 57, lines 39-53: At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score; see this in action in Bayliss Table 47, col. 59, lines 21-47: At step 1710, the total match scores for each of the entity references of a certain DID in Table 46 may be combined to arrive at an overall DID match value for the DID. Table 47 illustrates an exemplary sorted DID score table 1726 resulting from Table 46; Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs)
second based on cluster count indicating a number of clusters in which the candidate value appears, (Bayliss FIG. 15, col. 46, lines 30-50 describe its scoring being performed based on a number of appearances: At step 1508, the information in the selected data field for entity reference A may be compared to the corresponding information for entity reference B. If there is a match (i.e., equivalence or some acceptable degree of similarity), the score of the corresponding entity pair record in the score table 1522 may be adjusted (e.g., increased) at step 1510. For example, if the first name of entity reference A (having, for example, DID 1) matched the first name of entity reference B (having, for example, DID 3) then the score of the DID 3:DID 1 record in the score table 1522 would be adjusted by a determined or predetermined amount. The score adjustment may be constant regardless of the type of match or the magnitude of the score adjustment may be related to the type or degree of match (e.g., matching SSNs may be given a higher match score than matching first names))
Salhin in view of Bayliss and Marcuccilli and Psota does not expressly disclose:
and third based on record count indicating a number of similar records in which the candidate value appears.
However, Balakrishnan teaches this by teaching the following:
ranking the candidate values … third based on record count indicating a number of similar records in which the candidate value appears. (Balakrishnan FIG. 2B, ¶ 0041: Similarity scores are calculated for the historical product data records that satisfy the threshold and the historical product data records are ranked accordingly. Similarity scores between two products can be calculated using a variety of techniques. One approach is to count the number of attributes identical between the two (or more) products. For each identical attribute, the total of number possible values is summed up to represent the similarity score for that attribute. For example, the attribute “material” can have 100 different possible values. Two products having an identical value of “material: Cotton”, will contribute a value of 100 towards the similarity score, to indicate a strong signal of similarity) 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to use known techniques within Balakrishnan to rank its data records based on calculated similarity scores to improve similar techniques within Salhin as modified by Bayliss to rank its data records based on calculated scores.
In addition, both of the references (Salhin as modified by Bayliss and Balakrishnan) disclose features that are directed to analogous art, and they are directed to the same field of endeavor, such as scoring determined similarity of data entities.
Motivation to do so would be to apply Balakrishnan regarding similarity scores being based on a variety of techniques including potential diversity in attribute values to the techniques of Salhin as modified by Bayliss regarding match scores to return a desired entity. Motivation to do so would also be that utilizing machine learning techniques to predict and estimate information that is absent from the input data would have led one of ordinary skill to combine prior art reference teachings regarding similarity of data records to arrive at the claimed invention.

Regarding claim 9, similar to claim 17, Salhin in view of Bayliss and Marcuccilli and Psota teaches all the features with respect to claim 7 above including:
wherein providing the recommendation for the data field ... based on the ranked candidate values comprises providing a subset of the candidate values with highest candidate score based on a predetermined number of recommendations to be provided. (Bayliss col. 57, lines 41-54 describe a predetermined number: At step 1712, the DID score table 1726 is filtered by a predetermined threshold value to obtain those DIDs of the DID score table 1726 having the highest total match score. The threshold may be set at a certain minimum total match score, as a certain top percentage, as a certain number of the highest total match scores [shows the claimed 'number of recommendations'], and the like || Bayliss col. 59, lines 27-47 describe providing output associated with the DID having the highest score [a score of '162' in the example]: Table 47 then may be filtered (step 1712) to obtain the highest scored DIDs. In this example, Table 47 is filtered by taking only the highest scored DID: DID 1 having a total score of 162 (represented by row 1 of Table 44 [should say Table 47]). At step 1714, the entity references from Table 45 associated with DID 1 may be used to identify the sought-after person. Table 48 illustrates these entity references) 
Marcuccilli teaches the data field with no data. (Marcuccilli ¶ 0020: determining if the unmatched transaction data field from the incomplete transaction record is empty, and inputting data into the empty data field to match the like data field of the stored transaction record; FIG. 5, ¶ 0075: If any payment data fields are empty or missing from the incomplete record, the payment field is filled by inputting data. If an account number is found but no address [no data], for example, the payment can be further processed from the account number using data from a previous payment record to input the missing address data)

Response to Arguments
Applicant’s arguments, see p10, filed 08/22/2022, with respect to the 35 U.S.C. 103 rejection of claims 1-20 have been fully considered and are persuasive.  The 35 U.S.C. 103 rejection of claims 1-20 has been withdrawn. 

Applicant’s arguments, see p10, filed 08/22/2022, with respect to the 35 U.S.C. 112 interpretation of claims 1-20 have been fully considered. While the claims were not rejected under 35 U.S.C. 112, the claims are no longer being interpreted under 35 U.S.C. 112.

Applicant’s remaining arguments, see p12, filed 08/22/2022, with respect to the rejection(s) of claim(s) 1, 10, and 20 under 35 U.S.C. 103 have been fully considered but are not persuasive.


(c)
Regarding claims 1, 10, and 20, Applicant argues that Salhin does not do any comparison between a new data record and each “textual group descriptor.”
In response to Applicant’s arguments, Salhin is capable of performing the comparison between a new data record and each textual group descriptor as required by the claims as currently structured. Salhin ¶ 0148 refers to an ideal embodiment where a cluster is provided a group textual descriptor to encompass the entirety of the individual data records; however, the individual data records themselves do have respective textual descriptors (such as “rent,” “business rental,” etc.) and textual similarity above a threshold is determined for a plurality of data records.

(d)
Regarding claims 1, 10, and 20, Applicant argues that Bayliss does not describe selecting a set of “representative data records” as claimed as Bayliss does not describe any representative data record that “represent[s] a cluster of similar data records.”

In response to Applicant’s arguments, Bayliss in cited FIG. 7, col. 26, lines 34-47 selects an entity reference pair and selects a data field of entity reference A and selects a data field from entity reference B. As relied upon to address the limitation, Bayliss represents proximity of entity references to each other graphically. The entity references can be seen in FIGs. 1A-1B, col. 12, line 46-col. 13, line 60 as involving groupings of entity references that possibly represent a same entity. With this in mind, Bayliss can be seen to show a required association with a cluster of similar data records.
As a result, Bayliss shows at least “selecting a set of representative data records having a generated distance score within a distance threshold.”


Applicant’s remaining arguments, see p11, p12(d), and p13, filed 08/22/2022, with respect to the rejection(s) of claim(s) 1, 10, and 20 under 35 U.S.C. 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made under 35 U.S.C. 103 as being unpatentable over Salhin in view of Bayliss and newly incorporated reference Marcuccilli and newly incorporated reference Psota.

The dependent claims remain rejected at least by virtue of their dependence on rejected base claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 




Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEDIDIAH P FERRER whose telephone number is (571)270-7695. The examiner can normally be reached Monday-Friday 11:00am-7:00pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on (571)272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/J.P.F/Examiner, Art Unit 2164                                                                                                                                                                                                        December 2, 2022

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164