Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This action is a responsive to the application filed on 01/29/2018.
Claims 1-21 are pending.
Claims 1-21 are rejected.

Claim Objections
Claims 1, 8, and 15 are objected to because of the following informalities:
Claims 1, 8, and 15 recite a typo stating “for the plurality of field” in line 7 (of claim 1), and an optional amendment to overcome this objection would be as follows: “for the plurality of fields”.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-2, 4-9, 11-16, and 18-21 are rejected under 35 U.S.C. 103 as being unpatentable over Bilenko et al (“Adaptive Duplicate Detection Using Learnable String Similarity Measures”, 2003) hereinafter Bilenko, in view of Tereshkov et al (US Pub 20160180245) hereinafter Tereshkov.
Regarding claims 1, 8, and 15, Bilenko teaches a computer-implemented machine learning method, one or more non-transitory computer readable media storing a program of instructions that is executable by a device to perform, and a system, comprising: one or more computing processors; one or more non-transitory computer readable media storing a program of instructions that is executable by the one or more computing processors to perform (sections 2.1-2.2, and 3.2 teach executing “programming method[s]” and measuring “computational time”/“computational” expense indicative of performing the disclosed embodiments on a computer system, well known to include one or more memories storing/communicating executable code to one or more processors for performing the described functions):
receiving a training dataset having a plurality of training instances, each training instance in the plurality of training instances comprising (a) a pair of training records with a first record and a second record and (b) a label indicates whether there is a match between the first record and the second record, the first record having a first plurality of field values for a plurality of fields, the second record having a second plurality of field values for the plurality of field (sections 3.1-3.2 and Figs. 4 and 6 teach obtaining matched “records composed of k different fields” with computed “similarity between two field values” (first/second records having a first/second plurality of field values for a plurality of fields), where the “[m]atched pairs of duplicate records can be used to construct a training set (each training instance in the plurality of training instances) of such feature vectors by assigning them a positive class label (label indicates whether there is a match between the first record and the second record). Pairs of records that are not labeled as duplicates implicitly from the complementary set of negative examples (label indicates whether there is a match between the first record and the second record)”; and these are further used as “training ; 
determining a matching score vector for each such training instance, the matching score vector comprising a set of components storing a set of match scores for a set of extracted features derived from the first plurality of field values and the second plurality of field values (sections 3.1-3.2 and Figs. 4 and 6 teach calculating a vector representation (determining a matching score vector) of matched database records, where “any pair of records (for each such training instance) by an mk-dimensional vector. Each component of the vector (the matching score vector comprising a set of components) represents similarity between two field values of the records that is calculated using one of the m distance metrics (storing a set of match scores for a set of extracted features derived from the first plurality of field values and the second plurality of field values)”, and that these “feature vectors” are taught to be “composed of distance features” (set of extracted features)); 
based on a plurality of matching score vectors for the plurality of training instances in the training dataset and a match objective function, determining a set of match score thresholds for the set of extracted features (sections 3.1-3.2, 4.2, and Figs. 4 and 6 teach “selecting threshold values (determining a set of match score thresholds) that are appropriate for a particular database” from the “labeled data” comprising record pairs feature vectors (based on a plurality of matching score vectors for the plurality of training instances in the training dataset…for the set of extracted features), and a “precision-recall” associated “F-measure” calculation reflective of model accuracy (match objective function)); 
generating a set of match rules, each match rule in the set of match rules comprising a set of predicates based at least in part on a set of predicate features selected from the set of extracted features, each predicate in the set of predicates making a predication on whether two records match by comparing a match score derived from the two records against a match score threshold (sections 2.2.3, 3.1-3.2, and Figs. 4 and 6 teach each record having multiple fields and that a classifier learns (generating) multiple similarity measures/metrics (a set of match rules) “for each field of the database, [where] two learnable distance measures, d1 and d2, are trained and used to compute similarity for that field (generating a set of match rules). The values computed by these measures form the feature vector that is the classified by a support vector machine”; where the “vectors composed of distance features” from the record fields (each match rule in the set of match rules comprising a set of predicates based at least in part on a set of predicate features selected from the set of extracted features) are used as “training data”. Further, this classification categorizes “the resulting feature vector as belonging to the class of duplicates or non-duplicates, resulting in a distance estimate” and confidence which “represents similarity between the database records” in relation to a selected “threshold value” (each predicate in the set of predicates making a predication on whether two records match by comparing a match score derived from the two records against a match score threshold)); 
applying the set of matching rules to two or more records each having a plurality of field values for the plurality of fields to determine whether there is a match between any two of the two or more records (sections 2.2.3, 3.1-3.2, and Figs. 4-6 teach utilizing a “trained SVM” or classifier (applying the set of matching rules) .
Bilenko at least implies a computer-implemented machine learning method, one or more non-transitory computer readable media storing a program of instructions that is executable by a device to perform, and a system, comprising: one or more computing processors; one or more non-transitory computer readable media storing a program of instructions that is executable by the one or more computing processors to perform (see mapping above); and generating a set of match rules, each match rule in the set of match rules comprising a set of predicates based at least in part on a set of predicate features selected from the set of extracted features, each predicate in the set of predicates making a predication on whether two records match by comparing a match score derived from the two records against a match score threshold (see mapping above), however Tereshkov teaches a computer-implemented machine learning method, one or more non-transitory computer readable media storing a program of instructions that is executable by a device to perform, and a system, comprising: one or more computing processors; one or more non-transitory computer readable media storing a program of instructions that is executable by the one or more computing processors to perform (paragraphs 0030 and 0075-0077 teach a “computer-readable medium” storing and communicating “a set of instructions” to 
generating a set of match rules, each match rule in the set of match rules comprising a set of predicates based at least in part on a set of predicate features selected from the set of extracted features, each predicate in the set of predicates making a predication on whether two records match by comparing a match score derived from the two records against a match score threshold (abstract and paragraphs 0016-0017, 0023, 0063-0065 teach performing de-duplication for cleaning up record databases by training a classifier’s matching “rules” on a “set of training examples” (generating a set of match rules) including extracted “vectors” and “sub-vectors” with field features from database records (each match rule in the set of match rules comprising a set of predicates based at least in part on a set of predicate features selected from the set of extracted features), record field similarity “distance metrics”, and determined vector “distance represented by the distance assessment classifier is greater than a threshold” for linked/matched records (each predicate in the set of predicates making a predication on whether two records match by comparing a match score derived from the two records against a match score threshold)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Tereshkov’s teachings of a system to learn, detect, and classify record similarity field vectors for performing de-duplication for cleaning up record databases into Bilenko’s teaching of detecting duplicate database records though record field value similarity analysis in order to increase the “speed” of 

Regarding claims 2, 9, and 16, the combination of Bilenko and Tereshkov teach all the claim limitations of claims 1, 8, and 15 above; and further teach wherein each match score thresholds in the set of match score thresholds is used for comparison with match scores of a respective feature in the set of extracted features, as computed from records having field values of the plurality of fields, to make match or non-match predictions with respect to the records (Tereshkov, abstract and paragraphs 0016-0017, 0023, 0057, 0063-0065 teach training a classifier’s matching “rules” on a “set of training examples” including extracted “vectors” and “sub-vectors” with field features from database records (extracted features, as computed from records having field values of the plurality of fields), record field similarity “distance metrics”, and determined vector “distance represented by the distance assessment classifier (match scores of a respective feature in the set of extracted features) is greater than (comparison) a [selected] threshold” or less than a “threshold” of different threshold values (each match score thresholds in the set of match score thresholds is used) for linked/matched records (to make match or non-match predictions with respect to the records)); wherein each such match score thresholds in the set of match score thresholds is obtained from a plurality of match scores of the respective feature, as computed from training records in a plurality of instances, by minimizing a match error based on the match objective function (Tereshkov, abstract, paragraphs 0016-0017, 0023, 0057-0058, and 0062-0065 teach calculating .
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Tereshkov’s teachings of calculating record match thresholds into Bilenko’s teaching of detecting duplicate database records though record field value similarity analysis in order to increase accuracy of record similarity measurements by finding optimal thresholds through training (Tereshkov, abstract and paragraphs 0016-0017, 0023, 0057-0058, and 0062-0065).

Regarding claims 4, 11, and 18, the combination of Bilenko and Tereshkov teach all the claim limitations of claims 1, 8, and 15 above; and further teach wherein the two or more records belong to a set of database records among a plurality of sets of database records stored in a cloud-based computing system (Tereshkov, abstract and paragraphs 0015-0019, 0044, and 0067 teach “a pair of database records (the two or more records)” from databases of multiple records (belong to a set of database records among a plurality of sets of database records), where “databases may ; wherein each set of database records represent a respective type of entity among a plurality of different types of entities (Tereshkov, paragraph 0018 teaches “databases may contain elements of records of interest (represent a respective type of entity among a plurality of different types of entities)”, where records in “database 1 (each set of database records) may include a table of users with names and email addresses (represent a respective type of entity among a plurality of different types of entities) and database 2 (each set of database records) may contain transaction history for a user (represent a respective type of entity among a plurality of different types of entities)”); wherein the plurality of different types of entities includes at least one of: accounts, contacts, leads, company locations, company entities, products, shipping addresses, time events, or calendar entries (Bilenko, sections 3.1-3.2 and Fig. 5, and Table 2 teaches records in a restaurant database having restaurant address record field (plurality of different types of entities includes at least one of: company locations) and restaurant cuisine type record field (plurality of different types of entities includes at least one of: products).
Alternatively, Tereshkov, paragraph 0018 teaches record fields in “database 1 may include a table of users with names and email addresses (plurality of different types of entities includes at least one of: accounts)”).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Tereshkov’s teachings of a record databases for specific record field types for performing de-duplication for cleaning up record databases into Bilenko’s teaching of detecting duplicate database records 

Regarding claims 5, 12, and 19, the combination of Bilenko and Tereshkov teach all the claim limitations of claims 1, 8, and 15 above; and further teach wherein the set of match rules are initially generated fully automatically by a machine learning process from the plurality of training instances in the training dataset (Tereshkov, paragraphs 0012, 0023, 0042, 0051-0052, 0063-0065, and 0069 teach “automatically” setting the weights or “[d]istance metrics” for calculating vector distances in similarity measures of a “classifiers” learned distance calculations via AI, taught to be “machine-learning techniques” (set of match rules are initially generated fully automatically) during training with a labeled “set of training examples” (by a machine learning process from the plurality of training instances in the training dataset)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Tereshkov’s teachings of automatically setting calculation metrics in learned classifier record field similarity measurements via “machine-learning techniques” into Bilenko’s teaching of detecting duplicate database records though record field value types and similarity analysis in order to optimize similarity measurements through “automatically” tuned field similarity weights (Tereshkov, paragraphs 0012, 0023, 0042, 0051-0052, 0063-0065, and 0069).

Regarding claims 6, 13, and 20, the combination of Bilenko and Tereshkov teach all the claim limitations of claims 1, 8, and 15 above; and further teach wherein the set of match rules comprises a match rule that is displayed to a user through a user interface and that is edited by the user through the user interface (Tereshkov, paragraphs 0025, 0041-0042, 0051-0052, 0065, 0069, and claim 11 teach an API for interacting with data, such as “manually” setting metrics for calculating vector distances in similarity measures of a “classifiers” learned distance calculations (set of match rules comprises a match rule that is displayed to a user through a user interface and that is edited by the user through the user interface), and further presenting linked database records); wherein an editing operation performed based on user input includes one of: modifying predicate composition of the match rule, modifying a match score threshold in a predicate in the match rule, modifying a feature extraction method for extracting a feature in a predicate in the match rule, or modifying a similarity function used to determine match scores for a feature in a predicate in the match rule (Tereshkov, paragraphs 0025, 0042, 0051-0052, 0065, 0069, and claim 11 teach API for “manually” setting the weights or “[d]istance metrics” for calculating vector distances in similarity measures (editing operation performed based on user input includes one of: modifying a similarity function used to determine match scores for a feature in a predicate in the match rule)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to implement Tereshkov’s teachings of manually setting calculation metrics in learned classifier record field similarity measurements via an API into Bilenko’s teaching of detecting duplicate database records though record 

Regarding claims 7, 14, and 21, the combination of Bilenko and Tereshkov teach all the claim limitations of claims 1, 8, and 15 above; and further teach wherein the set of match rules are among a plurality of sets of match rules generated at least in part by supervised machine learning implemented by one or more computing devices (sections 2.1-2.2, and 3.2 teach executing “programming method[s]” and measuring “computational time”/“computational” expense indicative of performing the disclosed embodiments on a computer system, well known to include one or more memories storing/communicating executable code to one or more processors (computing devices implementing) for, as taught in sections 2.2.3, 3.1-3.2 and 4.3.3, using “large amounts of training data” and clustering “a large database” of records to train a SVM classifier (at least in part by supervised machine learning) to learn similarity distance measurements/metrics (the set of match rules are among a plurality of sets of match rules generated)); wherein the plurality of sets of match rules is applied by a computing system to provide at least one of: (a) automatic entity matching and recognition among a massive volume of data in the computing system (Bilenko, sections 2.1-2.2, and 3.2 teach computer system as mapped above that, as taught in sections 2.2.3, 3.1-3.2, 4.3.3, and Figs. 4-6, execute clustering and the trained SVM classifier to “[p]airs of records” in “a large database” of records for “full similarity comparison” of all record field values programmatically without , (b) data consistency across over a set of database tables across a plurality of instances of one or more datacenters in the computing system, or (c) complete white-box information about a match or non-match decision made with any match rule in the plurality of sets of match rules in terms of specific predicates used in the match rule, specific feature extraction methods used in the specific predicates of the match rule, specific similarity measure used to compute match scores for the specific predicates of the match rule, or specific match score threshold used to compare computed match scores for the specific predicates of the match rule.
Bilenko at least implies supervised machine learning implemented by one or more computing devices (see mapping above) and computing system (see mapping above), however Tereshkov teaches supervised machine learning implemented by one or more computing devices and computing system (Tereshkov, paragraphs 0030 and 0075-0077 teach a “computer-readable medium” storing and communicating “a set of instructions” to “processors”/a “processing unit”/“instruction execution system” (one or more computing devices) for, as taught in paragraphs 0012, 0023, 0042, 0051-0052, 0063-0065, and 0069, “automatically” setting the weights or “[d]istance metrics” for calculating vector distances in similarity measures of a “classifiers” learned distance calculations via AI, taught to be “supervised machine-learning techniques” during training with a labeled “set of training examples” (implemented supervised machine learning)).
.



10.	Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Bilenko et al (“Adaptive Duplicate Detection Using Learnable String Similarity Measures”, 2003) hereinafter Bilenko, in view of Tereshkov et al (US Pub 20160180245) hereinafter Tereshkov, in view of Bahmani et al (“ERBlox: Combining matching dependencies with machine learning for entity resolution”, 2017).
Regarding claims 3, 10, and 17, the combination of Bilenko and Tereshkov teach all the claim limitations of claims 1, 8, and 15 above. 
While the combination does teach training a machine learning classifier on “vectors composed of distance features” from the record fields to learn multiple similarity measures/metrics, Bahmani better teaches wherein the set of match rules comprises a match rule that are conjunctively joined by two or more predicates (abstract and sections 5-6 teach matching dependency being “declarative rules” (set of match rules) that include (comprises) “conjunction of relational atoms plus comparison atoms via similarity predicates” (match rule that are conjunctively joined by two or more predicates) for entity resolution in detecting duplicate records and margining for ; wherein the two or more predicates comprises a first predicate generated based on a first feature that is identified by a machine learning process as the most discriminating feature in the set of extracted features (sections 2.2-2.3, 3, and 5-6 teach creating relational predicates that represent record entities, each predicate (two or more predicates comprises a first predicate generated based on) includes a subset of chosen attributes (first feature) through “machine learning” functions including “[f]eature selection” (by a machine learning process), where “those attributes that have strong discriminatory power, to achieve maximum classification recall and precision” (first feature that is identified by a machine learning process as the most discriminating feature in the set of extracted features)); wherein the two or more predicates comprises a second predicate generated based on a second feature that is identified as having the least mutual information with the first feature (sections 2.2-2.3, 3, and 5-6 teach creating relational predicates that represent record entities, each predicate (two or more predicates comprises a second predicate generated based on) includes a subset of chosen attributes and block number, where a predicate’s block number (second feature) can indicate that the records “will never be declared duplicates” (identified as having the least mutual information with the first feature)).
Thus it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to modify detecting duplicate database records though record field value similarity analysis, as taught by Bilenko as modified by a system to learn, detect, and classify record similarity field vectors for performing de-duplication for cleaning up record databases as taught by Tereshkov, to include 


	
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Fernandez et al (US Pub 20170085509) teaches detecting and filtering out duplicative information in social media posts through machine learning. 
Osenia et al (US Pub 20180137150) teaches eliminating or reducing duplicate records through supervised machine learning entity resolution. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CLINT MULLINAX whose telephone number is 571-272-3241.  The examiner can normally be reached on Mon - Fri 8:00-4:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.





/C.M./Examiner, Art Unit 2123                                                                                                                                                                                                        

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123