Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
IDS, filed March 11, 2020 and February 19, 2020, have been considered.
Claims 1-9, February 19, 2020, are examined on the merits.
Double Patenting
A rejection based on double patenting of the “same invention” type finds its support in the language of 35 U.S.C. 101 which states that “whoever invents or discovers any new and useful process... may obtain a patent therefor...” (Emphasis added). Thus, the term “same invention,” in this context, means an invention drawn to identical subject matter. See Miller v. Eagle Mfg. Co., 151 U.S. 186 (1894); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Ockert, 245 F.2d 467, 114 USPQ 330 (CCPA 1957).
A statutory type (35 U.S.C. 101) double patenting rejection can be overcome by canceling or amending the claims that are directed to the same invention so they are no longer coextensive in scope. The filing of a terminal disclaimer cannot overcome a double patenting rejection based upon 35 U.S.C. 101.
Claims 1-9 is/are rejected under 35 U.S.C. 101 as claiming the same invention as that of claims 1-9 of prior U.S. Patent No. 10599732 B2. This is a statutory double patenting rejection.
16794895
U.S. Patent No. 10599732 B2
1. A method for linking data records across datasets, the method comprising: identifying a plurality of datasets, each dataset comprising at least one data record, each data record associated with an entity and comprising one or 
creating a token set for each data record, each token set comprising a plurality of tokens and each token associated with one of the attributes and comprising a representation of the attribute value of the associated attribute, each token set comprising all tokens obtained from converting all values associated with a given attribute in a given data record into one or more tokens;
comparing attributes across datasets by comparing token sets;
identifying pairs of attributes satisfying a predetermined similarity threshold based on a similarity of attribute values determined by comparing token sets;
using the identified pairs of attributes to identify linkage points between pairs of data records, each data record in each pair of data records contained in a different dataset in a given pair of datasets and each pair of data records associated with a common entity;

linking data records associated with common entities across datasets using the identified linkage points.

creating a token set for each data record, each token set comprising a plurality of tokens and each token associated with one of the attributes and comprising a representation of the attribute value of the associated attribute, each token set comprising all tokens obtained from converting all values associated with a given attribute in a given data record into one or more tokens; comparing attributes across datasets by comparing token sets; 
identifying pairs of attributes satisfying a predetermined similarity threshold based on a similarity of attribute values determined by comparing token sets; 
using the identified pairs of attributes to identify linkage points between pairs of data records, each data record in each pair of data records contained in a different dataset in a given pair of datasets and each pair of data records associated with a common entity; 

linking data records associated with common entities across datasets using the identified linkage points.

2. The method of claim 1, wherein: each value comprises a value string comprising at least one of alpha-numeric characters, non-alphanumeric characters and spaces; and each token comprises at least a portion of the value string.
3. The method of claim 2, wherein creating the token set further comprises: converting the value string in each value into a given token by placing the value string into the given token without changes; transforming any upper case 

replacing non-alphanumeric characters with spaces or combinations thereof.


4. The method of claim 1, wherein: each value comprises a value string comprising at least one of alpha-numeric characters, non-alphanumeric characters and spaces; and creating the token set further comprises converting the value string in at least one value into a plurality of tokens, each token in the plurality of token comprising at least a portion of the value string.
5. The method of claim 4, wherein converting the value string in at least one value into a plurality of tokens further comprises: placing the value string into the set of tokens without changes;
transforming any upper case alpha-numeric character in the value string into a corresponding lower case alpha character; breaking the value string into a plurality of tokens defined by the spaces in the value string, replacing nonalphanumeric characters with spaces; and 



6. The method of claim 1, wherein the method further comprises transforming each data record in each dataset into one or more data record triples, each data record triple comprising a label, an identification of a given attribute and a value for the given attribute.
7. The method of claim 6, wherein the label comprises an alphanumeric string.
7. The method of claim 6, wherein the label comprises an alpha-numeric string.
8. The method of claim 6, wherein creating the token set further comprises: converting each data record triple into one or more tokens.
8. The method of claim 6, wherein creating the token set further comprises converting each data record triple into one or more tokens.
9. The method of claim 1, wherein identifying matching attributes further comprises: comparing the token sets of attributes using a set similarity function comprising or intersection size, Jaccard similarity coefficient, Dice’s coefficient or maximum inclusion degree or an information retrieval type relevance function comprising cosine similarity with term frequency—inverse document frequency or Okapi BM25 to identify 




CONCLUSION
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Morton et al. (US 2015199363 A1) discloses one of the most difficult and complex tasks in a data processing environment involves the data integration process of accurately matching, linking, and/or clustering records from multiple data sources that refer to a person, a business, a hierarchical structure or other entity.
Bamberger et al. (US 20020178394 A1) discloses topic server 70 is operative to extract data from a plurality of semi-structured data sources 90 and preferably performing one or more of the following topic-building data processing functions: data copying, data linking, topic extraction, trigger extraction, unification and/or one or more of the following look-up functions: disambiguation and lookup.
Mitchell, C. (US 20100161616 A1) discloses linking the second structured content record with the unstructured content stored in the data storage system via the indexing data to allow access to the unstructured content in the data storage system via the second structured content record.
Roy et al. (US 20100228794 A1) discloses step of linking, performed in linker element 120, includes but is not limited to mapping a plurality of data elements between a structured data source and an unstructured textual data source.
Telloli et al., (US 20100094853 A1) discloses other link-analysis measures, spam scores, etc. The query-dependent score may be any algorithm that measures the relevance of a resource with respect to 
Patent applicants with problems or questions regarding electronic images that can be viewed in the Patent Application Information Retrieval system (PAIR) can now contact the USPTO's Patent Electronic Business Center (Patent EBC) for assistance.  Representatives are available to answer your questions daily from 6 am to midnight (EST). The toll free number is (866) 217-9197. When calling please have your application serial or patent number, the type of document you are having an image problem with, the number of pages and the specific nature of the problem.  The Patent Electronic Business Center will notify applicants of the resolution of the problem within 5-7 business days. Applicants can also check PAIR to confirm that the problem has been corrected.  The USPTO's Patent Electronic Business Center is a complete service center supporting all patent business on the Internet. The USPTO's PAIR system provides Internet-based access to patent application status and history information. It also enables applicants to view the scanned images of their own application file folder(s) as well as general patent information available to the public. 
For all other customer support, please call the USPTO Call Center (UCC) at 800-786-9199.  The USPTO's official fax number is 571-272-8300.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to C. Dune Ly, whose telephone number is (571) 272-0716.  The examiner can normally be reached on Monday-Friday from 8 A.M. to 4 P.M.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's supervisor, Neveen Abel-Jalil, can be reached on 571-270-0474.
/Cheyne D Ly/
Primary Examiner, Art Unit 2152