DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to communication filed on 6 January 2022. Claims 1-20 are pending in the case. Claims 1, 6, 16, and 19 were amended. Claims 1, 16, and 19 are the independent claims. This action is non-final. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on January 6th, 2022 has been entered.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.

4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-4, 6, 8-16, and 18-20 is being rejected under 35 U.S.C. 103 as being unpatentable over Cassidy et al. (US 2017/0308557 A1) in view of Lang et al. (US 2003/0225770 A1), further in view of Thomas et al. (US 2016/0048770 A1).
Regarding claim 1, Cassidy teaches a method:
receiving transactional data from a plurality of databases, the transactional data being received in each database from a plurality of sources and comprising a plurality of input records, each input record comprising a string representation of an entity having a particular location and including a plurality of string components, wherein input records from different sources represent particular entities having corresponding locations using a plurality of different string representations (see Cassidy, Paragraph [0002], “The data is gathered from a variety of different data sources and is electronically stored in various formats as records in databases. Examples of data sources may include, but are not limited to, employee database, sales database, contact center database, offline records, customer escalation records, company's social media followers records, customer query records and mailing lists records.”);
mapping string components of one or more of the string representations in each input record into one or more corresponding master string components (see Cassidy, Paragraph [0035], “The cleansing module 204 standardizes the data included in the database (say the database 106).”);

However, Cassidy does not explicitly teach:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record;

Lang teaches:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record (see Lang, Paragraph [0045], “a database administrator may use a database string search or other query function to identify master records that are similar but not identical in one or more computer systems.”);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing), and arrived at a method that incorporates a master record search engine. One of ordinary skill in the art would have been motivated to make such a combination for the purposes of improving data cleansing (see Lang, Paragraph [0044]). In addition, both the references (Cassidy and Lang) teach features that are directed to analogous art and they are directed to the same field of 

However, the combination of Cassidy, and Lang do not explicitly teach:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record;

Thomas teaches:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record (see Thomas, Paragraph [0031], “The tokenizers 164-166 in entity matching component 152 then normalize the data by generating a normalized token 168-170 for each record.”);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing), further in view of Thomas (teaching entity resolution incorporating data from various data sources), and arrived at a method that tokenizes records. One of ordinary skill in the art would have been motivated to make such a combination for the purposes of determining matching records (see Thomas, Paragraph [0024]). In addition, the references (Cassidy, Lang and Thomas) teach features that are directed to analogous art and they are directed to the same field 

The combination of Cassidy, Lang, and Thomas further teaches:
and for each input record in the plurality of input records, using a machine learning algorithm to select a master record in the plurality of master records that matches the input record based on a training set, the master string representations in the master records and the string representations in the input record (see Cassidy, Paragraphs [0035], [0043]-[0046], “The machine learning algorithm module 212 analyses the labeled vector and classifies the remaining non labeled vector. The set of labeled vectors is used as a training and test set for a machine learning model for classification of unlabeled vector… the machine learning algorithm module 212 classifies the remaining vectors and returns a confidence level with each label... The cluster creation module 216 then analyzes the match-pairs to create clusters… The cluster creation module 216 processes all match-pairs and thereafter creates clusters. Master records are identified using defined rules. For example, the most complete record may be considered as the master record in each cluster.”).

Regarding claim 2, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 1. Cassidy further teaches:
regular expression matching one or more string components of one or more string representations to a plurality of predetermined string representations; and determining if one or more string components meets a predetermined pattern corresponding to a particular string component (see Cassidy, Paragraph [0006], “the records in each cluster are merged to obtain a de-duplicated cleansed database using predefined consolidated rules.”).

Regarding claim 3, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 1. Cassidy further teaches:
wherein the mapped string components comprise one or more of a name string, an address string, a phone number string, and a zip code string (see Cassidy, Paragraph [0035], “all fields in the data are converted to upper case or lower case, special characters is removed, fields such as phone and zip code are standardized based on geographical location or address, and the like.”).

Regarding claim 4, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 1. Cassidy further teaches:
storing the input records in a distributed file storage system; and prior to the mapping, ordering the records by order of occurrence (see Cassidy, Paragraph [0073], “While the machine-readable medium 622 is shown in an example embodiment to be a single medium, the term “machine-readable medium” may include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more instructions 624 or data structures.”).

Regarding claim 6, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 1. Cassidy further teaches:
wherein the plurality of master records retrieved for each input record are an ordered set of matching master data records (see Cassidy, Paragraph [0037], [0041], “if the records ‘i’ and ‘j’ are being compared then ‘kth’ component of the similarity vector (wherein i, j and k are numeric values) gives the string similarity score for the strings in field ‘k’ of records ‘i’ and ‘j’. In an embodiment, a similarity score of 1 indicates an exact match and a score of 0 indicates non-match (complete dissimilarity). In an embodiment, a score between ‘0.85’ and ‘0.99’ is considered to be a close match.” Also, see Lang, Paragraph [0045], “a database administrator may use a database string search or other query function to identify master records that are similar but not identical in one or more computer systems.”).


wherein the training set comprises a distance measure of the top three (3) corresponding similarity search scores (see Cassidy, Paragraph [0042], “The different matching methods may include, but are not limited to, vector spacing method, distance metric method, absolute or relative difference methods, and the like.”).

Regarding claim 9, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 1. Cassidy further teaches:
determining a plurality of ratios of a first value to a second value, wherein the first value is an overlap of bigrams between one or more string components in the input record and corresponding string components in the most similar master record, and wherein the second value is a total number of bigrams in the one or more string components in the input record and the string components in the most similar master record; generating a plurality of weights based on the training set; and determining the final score based on a logical regression, the logical regression comprising a linear combination of a product of the weights and the ratios (see Cassidy, Paragraph [0030], “Pair-wise comparisons are made in the cleansed data to label the pairs as a match or a non-match” Paragraph [0051], “The machine learning algorithm identifies patterns in the data that characterize duplicate pairs of records. The matches, based on these rules, could be termed as ‘close match’, ‘exact match’, ‘non-match’ and the like. The fields marked as ‘close match’ are the records which are not the exact match but could be inferred as same on analysis.”).

Regarding claim 10, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 1. Cassidy further teaches:
wherein scores from the similarity search are integrated into weights of the machine learning algorithm (see Cassidy, Paragraphs [0037], [0041]-[0044], [0053], “if the records ‘i’ and ‘j’ are being compared then ‘kth’ component of the similarity vector (wherein i, j and k are numeric values) gives the string similarity score for the strings in field ‘k’ of records ‘i’ and ‘j’. In an embodiment, a similarity score of 1 indicates an exact match and a score of 0 indicates non-match (complete dissimilarity). In an embodiment, a score between ‘0.85’ and ‘0.99’ is considered to be a close match… The machine learning algorithm module 212 analyses the labeled vector and classifies the remaining non labeled vector. The set of labeled vectors is used as a training and test set for a machine learning model for classification of unlabeled vector.”).

Regarding claim 11, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 1. Cassidy further teaches:
wherein if the final score is greater than a threshold, data from the input record is automatically merged into the most similar master record (see Cassidy, Paragraph [0062], “the processor 404 identifies a master record in each cluster of records. Subsequently, the processor 404 merges records in each cluster to obtain a de-duplicated cleansed database using predefined consolidated rules.”).

Regarding claim 12, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 11. Cassidy further teaches:
specifying one or more string components of the input records; and if the final score for a particular input record is above the threshold, comparing the specified string components in the particular input record to the corresponding string components in the most similar master record, and rejecting the input record if the specified string components do not match even when final score is above the threshold (see Cassidy, Paragraph [0055], “At step 316, the records in each cluster are merged to obtain a de-duplicated cleansed database using predefined consolidated rules”).

Regarding claim 13, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 11. Cassidy further teaches:
wherein if the final score is greater than a threshold, generating a difference record comprising one or more string components that are different between the input record and the most similar master record (see Cassidy, Paragraph [0062], “the processor 404 identifies a master record in each cluster of records. Subsequently, the processor 404 merges records in each cluster to obtain a de-duplicated cleansed database using predefined consolidated rules.”).

Regarding claim 14, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 13. Cassidy further teaches:
storing a plurality of the difference records for a corresponding plurality of input records and master records; and applying the difference records to the master records as each input record is processed, and in accordance therewith, supplementing data in the master records (see Cassidy, Paragraph [0062], “the processor 404 identifies a master record in each cluster of records. Subsequently, the processor 404 merges records in each cluster to obtain a de-duplicated cleansed database using predefined consolidated rules.”).

Regarding claim 15, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 14. Cassidy further teaches:
rolling back the master records based on the difference records (see Cassidy, Paragraph [0062], “the processor 404 identifies a master record in each cluster of records. Subsequently, the processor 404 merges records in each cluster to obtain a de-duplicated cleansed database using predefined consolidated rules.”).

Regarding claim 16, Cassidy teaches a non-transitory machine-readable medium storing a program executable by at least one processing unit of a computer, the program comprising sets of instructions for: 
receiving transactional data from a plurality of databases, the transactional data being received in each database from a plurality of sources and comprising a plurality of input records, each input record comprising a string representation of an entity having a particular location and including a plurality of string see Cassidy, Paragraph [0002], “The data is gathered from a variety of different data sources and is electronically stored in various formats as records in databases. Examples of data sources may include, but are not limited to, employee database, sales database, contact center database, offline records, customer escalation records, company's social media followers records, customer query records and mailing lists records.”);
mapping string components of one or more of the string representations in each input record into one or more corresponding master string components (see Cassidy, Paragraph [0035], “The cleansing module 204 standardizes the data included in the database (say the database 106).”);

However, Cassidy does not explicitly teach:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record;

Lang teaches:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record (see Lang, Paragraph [0045], “a database administrator may use a database string search or other query function to identify master records that are similar but not identical in one or more computer systems.”);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing), and arrived at a machine that incorporates a master record search engine. One of ordinary skill in the art would have been motivated to make such a combination for the purposes of improving data cleansing (see Lang, Paragraph [0044]). In addition, both the references (Cassidy and Lang) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as data cleansing. The close relation between both of the references highly suggests an expectation of success.

However, the combination of Cassidy, and Lang do not explicitly teach:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record;

Thomas teaches:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record (see Thomas, Paragraph [0031], “The tokenizers 164-166 in entity matching component 152 then normalize the data by generating a normalized token 168-170 for each record.”);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing), further in view of Thomas (teaching entity resolution incorporating data from various data sources), and arrived at a machine that tokenizes records. One of ordinary skill in the art would have been motivated to make such a combination for the purposes of determining matching records (see Thomas, Paragraph [0024]). In addition, the references (Cassidy, Lang and Thomas) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as data cleansing. The close relation between the references highly suggests an expectation of success.

The combination of Cassidy, Lang, and Thomas further teaches:
and for each input record in the plurality of input records, using a machine learning algorithm to select a master record in the plurality of master records that matches the input record based on a training set, the master string representations in the master records and the string representations in the input record (see Cassidy, Paragraphs [0035], [0043]-[0046], “The machine learning algorithm module 212 analyses the labeled vector and classifies the remaining non labeled vector. The set of labeled vectors is used as a training and test set for a machine learning model for classification of unlabeled vector… the machine learning algorithm module 212 classifies the remaining vectors and returns a confidence level with each label... The cluster creation module 216 then analyzes the match-pairs to create clusters… The cluster creation module 216 processes all match-pairs and thereafter creates clusters. Master records are identified using defined rules. For example, the most complete record may be considered as the master record in each cluster.”).


determining a plurality of ratios of a first value to a second value, wherein the first value is an overlap of bigrams between one or more string components in the input record and corresponding string components in the most similar master record, and wherein the second value is a total number of bigrams in the one or more string components in the input record and the string components in the most similar master record; generating a plurality of weights based on the training set; and determining the final score based on a logical regression, the logical regression comprising a linear combination of a product of the weights and the ratios (see Cassidy, Paragraph [0030], “Pair-wise comparisons are made in the cleansed data to label the pairs as a match or a non-match” Paragraph [0051], “The machine learning algorithm identifies patterns in the data that characterize duplicate pairs of records. The matches, based on these rules, could be termed as ‘close match’, ‘exact match’, ‘non-match’ and the like. The fields marked as ‘close match’ are the records which are not the exact match but could be inferred as same on analysis.”).

Regarding claim 19, Cassidy teaches a computer system comprising:
a processor; a memory; and non-transitory machine-readable medium storing a program executable by the processor, the program comprising sets of instructions for: receiving transactional data from a plurality of databases, the transactional data being received in each database from a plurality of sources and comprising a plurality of input records, each input record comprising a string representation of an entity having a particular location and including a plurality of string components, wherein input records from different sources represent particular entities having corresponding locations using a plurality of different string representations (see Cassidy, Paragraph [0002], “The data is gathered from a variety of different data sources and is electronically stored in various formats as records in databases. Examples of data sources may include, but are not limited to, employee database, sales database, contact center database, offline records, customer escalation records, company's social media followers records, customer query records and mailing lists records.”);
mapping string components of one or more of the string representations in each input record into one or more corresponding master string components (see Cassidy, Paragraph [0035], “The cleansing module 204 standardizes the data included in the database (say the database 106).”);

However, Cassidy does not explicitly teach:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record;

Lang teaches:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record (see Lang, Paragraph [0045], “a database administrator may use a database string search or other query function to identify master records that are similar but not identical in one or more computer systems.”);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing), and arrived at a system that incorporates a master record search engine. One of ordinary skill in the see Lang, Paragraph [0044]). In addition, both the references (Cassidy and Lang) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as data cleansing. The close relation between both of the references highly suggests an expectation of success.

However, the combination of Cassidy, and Lang do not explicitly teach:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record;

Thomas teaches:
for each input record in the plurality of input records, dividing the string representations in the input record into a set of tokens, searching a master record data store based on the set of tokens and tokens generated from master string representations of master records stored in the master record data store, and retrieving, based on the search, a plurality of master records comprising master string representations are most similar to string representations in the input record (see Thomas, Paragraph [0031], “The tokenizers 164-166 in entity matching component 152 then normalize the data by generating a normalized token 168-170 for each record.”);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing), further in view of Thomas (teaching entity resolution incorporating data from various data sources), and arrived at a system that tokenizes records. One of ordinary skill in the art would see Thomas, Paragraph [0024]). In addition, the references (Cassidy, Lang and Thomas) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as data cleansing. The close relation between the references highly suggests an expectation of success.

The combination of Cassidy, Lang, and Thomas further teaches:
and for each input record in the plurality of input records, using a machine learning algorithm to select a master record in the plurality of master records that matches the input record based on a training set, the master string representations in the master records and the string representations in the input record (see Cassidy, Paragraphs [0035], [0043]-[0046], “The machine learning algorithm module 212 analyses the labeled vector and classifies the remaining non labeled vector. The set of labeled vectors is used as a training and test set for a machine learning model for classification of unlabeled vector… the machine learning algorithm module 212 classifies the remaining vectors and returns a confidence level with each label... The cluster creation module 216 then analyzes the match-pairs to create clusters… The cluster creation module 216 processes all match-pairs and thereafter creates clusters. Master records are identified using defined rules. For example, the most complete record may be considered as the master record in each cluster.”).

Regarding claim 20, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 19. Cassidy further teaches:
determining a plurality of ratios of a first value to a second value, wherein the first value is an overlap of bigrams between one or more string components in the input record and corresponding string components in the most similar master record, and wherein the second value is a total number of bigrams in the one or more string components in the input record and the string components in the most similar master record; generating a plurality of weights based on the training set; and determining the final score based on a logical regression, the logical see Cassidy, Paragraph [0030], “Pair-wise comparisons are made in the cleansed data to label the pairs as a match or a non-match” Paragraph [0051], “The machine learning algorithm identifies patterns in the data that characterize duplicate pairs of records. The matches, based on these rules, could be termed as ‘close match’, ‘exact match’, ‘non-match’ and the like. The fields marked as ‘close match’ are the records which are not the exact match but could be inferred as same on analysis.”).

Claims 5 and 17 are being rejected under 35 U.S.C. 103 as being unpatentable over Cassidy in view of Lang in view of Thomas, further in view of Khuti et al. (US 2020/0067789 A1).
Regarding claim 5, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 1. However, the combination of Cassidy, Lang, Thomas do not explicitly teach:
wherein the master record data store comprises an elastic search data store that indexes master string representations.

Khuti teaches:
wherein the master record data store comprises an elastic search data store that indexes master string representations (see Khuti, Paragraphs [0445], “Elastic Search… Distributed RESTful search engine for dealing with unstructured and semi structured data.”).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing) in view of Thomas (teaching entity resolution incorporating data from various data sources), further in view of Khuti (teaching systems and methods for distributed systemic anticipatory industrial asset intelligence), and arrived at a method that incorporates elastic searching. One of ordinary skill in the art would have been motivated to make such a combination for the purposes of see Khuti, Paragraph [0445]). In addition, the references (Cassidy, Lang, Thomas and Khuti) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as master records. The close relation between the references highly suggests an expectation of success.

Regarding claim 17, Cassidy in view of Lang, further in view of Thomas teaches all the limitations of claim 16. However, the combination of Cassidy, Lang, and Thomas do not explicitly teach:
wherein the master record data store comprises an elastic search data store that indexes master string representations.

Khuti teaches:
wherein the master record data store comprises an elastic search data store that indexes master string representations (see Khuti, Paragraphs [0445], “Elastic Search… Distributed RESTful search engine for dealing with unstructured and semi structured data.”).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing) in view of Thomas (teaching entity resolution incorporating data from various data sources), further in view of Khuti (teaching systems and methods for distributed systemic anticipatory industrial asset intelligence), and arrived at a machine that incorporates elastic searching. One of ordinary skill in the art would have been motivated to make such a combination for the purposes of dealing with unstructured and semi structured data (see Khuti, Paragraph [0445]). In addition, the references (Cassidy, Lang, Thomas and Khuti) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as master records. The close relation between the references highly suggests an expectation of success.

Claim 7 is being rejected under 35 U.S.C. 103 as being unpatentable over Cassidy in view of Lang in view of Thomas, further in view of Tereshkov et al. (US 2016/0180245 A1).

wherein the training set comprises dice coefficients on one or more string components of the input record and corresponding similarity search master record result

Tereshkov teaches:
wherein the training set comprises dice coefficients on one or more string components of the input record and corresponding similarity search master record result (see Tereshkov, Paragraph [0068], “As discussed above, some of the distance algorithms used for atomic distance assessment between two different fields of the two compared records include Dice Coefficient,”).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined Cassidy (teaching method and system for cleansing and de-duplicating data) in view of Lang (teaching collaborative data cleansing) in view of Thomas (teaching entity resolution incorporating data from various data sources), further in view of Tereshkov (teaching a method and system for linking heterogeneous data sources), and arrived at a method that incorporates a dice coefficients. One of ordinary skill in the art would have been motivated to make such a combination for the purposes of assessing distance (see Tereshkov, Paragraph [0068]). In addition, the references (Cassidy, Lang, Thomas and Tereshkov) teach features that are directed to analogous art and they are directed to the same field of endeavor, such as master records. The close relation between the references highly suggests an expectation of success.

Response to Arguments
Applicant’s Arguments, filed January 6th, 2022, have been fully considered, but are not persuasive. 



Applicant’s Arguments, filed January 6th, 2022, have been fully considered, but are moot in light of the new grounds of rejection. 

Applicant argues on pages 9-10 of Applicant's Remarks that the cited references do not teach “for each input record in the plurality of input records, uses a machine learning algorithm to select a master record in the plurality of master records that matches the input record based on a training set, the master string representations in the master records, and the string representations in the input record.” The Examiner respectfully disagrees.

Cassidy discloses a method for “cleansing and de-duplicating data in a database” by training “a machine learning model to identify duplicates in the cleansed database” (see Cassidy, Paragraph [0006]). As mentioned in Applicant’s Remarks on pages 9-10, Cassidy discloses in paragraphs [0006]-[0008], [0041], and [0044], generating similarity vectors, labeling matched vectors, and using a machine learning algorithm in order to classify non labeled similarity vectors, in which Cassidy also discloses in paragraph [0044], that the set of labeled vectors is used as a training and test set for the machine learning algorithm to classify the unlabeled similarity vectors. Therefore, it is believed that the cited references teaches Applicant’s argued claim because the machine learning algorithm processes similarity vectors which is generated from the input data and data in the cleansed database, as well as the training set in order to label similarity vectors in order to identify duplicates in the cleansed database, and select a master record that is considered the most complete (see Cassidy, Paragraph [0046]).




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUSAM TURKI SAMARA whose telephone number is (571)272-6803.  The examiner can normally be reached on Monday - Thursday, Alternate Fridays.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Apu Mofiz can be reached on (571)-272-4080.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/HUSAM TURKI SAMARA/
Examiner, Art Unit 2161













/APU M MOFIZ/Supervisory Patent Examiner, Art Unit 2161