Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Status
Claims 1-23 are pending.
Response to Arguments
Applicants arguments have been fully considered but are not persuasive. The applicant appears to argue that the St. Clair reference doesn’t teach the following: 
“1) analyzing the source data, wherein analyzing the source data includes generating data profiling statistics from the source data and classifying attributes of the source data; 
2) determining at least one data domain associated with the source data based, at least in part, on the data profiling statistics, the classified attributes, and ontology data; 
3) and determining, for the at least one data domain associated with the source data, a number of required matching algorithms for a data matching engine to execute data deduplication within the source data”
although it is unclear to the examiner how this distinguishes over St Clair.
Regarding the limitation “analyzing the source data, wherein analyzing the source data includes generating data profiling statistics from the source data and classifying attributes of the source data”, 
the applicant describes “source data” as “any data describing entities in the real physical world”(Specification Para 0048). St. Clair discloses “clinical information about an individual patients” (Para 37). Since these patients are entities in the real physical world, “clinical information” is analogous to “source data”.
The applicant describes “data profiling statistics” as “a process of determining information about the data entities in the source data”(Specification Para 49). St Clair discloses “patient-centric data processes are combined to correctly identify the patient and all the data from all sources that belong to that patient” (Para 40). “Data from all sources” is analogous to “data entities”, “correctly identify the patient” is analogous to “determining information”; therefore, “patient-centric data processes” is analogous to “data profiling statistics”.
The applicant describes “classifying attributes” as “determining metadata such as …uniqueness of attributes, etc.” (Specification Para 0073). St. Clair discloses “an integration process examines the aggregate record for duplicate and overlapping data”(Para 17). This “duplicate data” is analogous to “uniqueness of attributes”; therefore, “examining the aggregate record for duplicate and overlapping data” is analogous to “classifying attributes of the source data”.
According to the above definitions, St. Clair teaches “analyzing the source data (i.e., clinical information), wherein analyzing the source data includes generating data profiling statistics (i.e., patient-centric data processes) from the source data (i.e., clinical information) and classifying attributes of the source data (i.e., examining the aggregate record for duplicate and overlapping data)”.

Regarding the limitation “determining at least one data domain associated with the source data based, at least in part, on the data profiling statistics, the classified attributes, and ontology data”,
the applicant defines “data domain” as “the content and context area in the physical world the data belongs, e.g.,  healthcare data, customer data, etc.” (Specification Para 51). St. Clair discloses “a validation process is used in some embodiments to confirm that the patient in question is the correct patient” (Para 0019), “correct patient” signifies both healthcare data and customer data, therefore “correct patient” is analogous to a “data domain”. 
The applicant defines “ontology data” as “a catalog of terms and entities symbolizing real-world entities” (Specification Para 0052). St. Clair discloses “data that includes acute or chronic diseases and other health and wellness information”(Para 0039). Since these diseases are real-world entities, the “data” that represents them is analogous to “ontology data”.

 According to the above definitions, St. Clair teaches “determining at least one data domain (i.e., correct patient) associated with the source data based (i.e., clinical information), at least in part, on the data profiling statistics (i.e., patient-centric data processes), the classified attributes (i.e., examining the aggregate record for duplicate and overlapping data), and ontology data (i.e., data that includes acute or chronic diseases and other health and wellness information)”.

Regarding the limitation “determining, for the at least one data domain associated with the source data, a number of required matching algorithms for a data matching engine to execute data deduplication within the source data”, 
the common definition of “determining” is “ascertain or establish exactly, typically as a result of research or calculation”(Oxford English Dictionary). 
“A number of required matching algorithms” could be interpreted as a numerical value that represents the total amount of required matching algorithms, or as a reference to the required matching algorithms themselves. The rejection in this office action is based on the second interpretation. 
The applicant defines “matching algorithm” as “a schema allowing for comparing attributes of a record and determining whether two records may relate to the same physical entity” (Specification Para 0053).
St. Clair discloses “an integration process examines the aggregate record for duplicate and overlapping data, identifies that data, and eliminates the duplicate data”(Para 17). The process of “eliminating duplicate data” inherently requires “a number of matching algorithms” and a “data matching engine”.  A computer system cannot perform a “number of matching algorithms” without establishing exactly (i.e., determining) what algorithms are to be included in the “number of matching algorithms”. 
Therefore, St Clair teaches “determining (i.e., establishing exactly), for the at least one data domain associated with the source data (i.e., patient(s) data), a number of required matching algorithms for a data matching engine to execute data deduplication within the source data (i.e., eliminate the duplicate data)”.
In view of the foregoing, the rejection of claim 1 (and similar claims 12 and 23) in view of St. Clair is maintained. The rejection of claims 2-11 and 13-22 are maintained due to dependence on claims 1 and 12, respectively.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1, 10, 12, 21, and 23 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by St. Clair et al (US 20070055552 A1) hereafter St. Clair.
Regarding claim 1, St. Clair teaches a computer-implemented method for configuring data deduplication, the method comprising: receiving source data (Para 0041, aggregation is a process that collects all data from one or more disparate sources that could belong to the patient in question); analyzing the source data, wherein analyzing the source data includes generating data profiling statistics from the source data and classifying attributes of the source data (Para 0040, In exemplary embodiments these patient-centric data processes are combined to correctly identify the patient and all the data from all sources that belong to that patient, “patient-centric data processes” is analogous to “data profiling statistics”); determining at least one data domain associated with the source data based, at least in part, on the data profiling statistics, the classified attributes, and ontology data (Para 0044, a validation process is used in some embodiments to confirm that the patient in question is the correct patient, “correct patient” is analogous to “data domain”); and determining, for the at least one data domain associated with the source data, a number of required matching algorithms for a data matching engine to execute data deduplication within the source data (Para 0092, the group of records for a particular patient may be processed to eliminate duplicate records, the action of “eliminate duplicate records” requires the computer system determining a number of required matching algorithms in order to perform the action).
Regarding claim 10, St. Clair teaches the computer-implemented method of claim 1, wherein the data profiling statistics from the source data and the classified attributes of the source data includes one or more of: technical metadata of the received source data; data quality metric values per attribute of the source data; relationship descriptors between sets of the source data (Para 0015, these patient-centric data processes correctly identify the patient and all the data from all sources that belong to that patient); and a data classification per attribute, and thereby a linkage of the attributes and their relationships.
Regarding claim 12, St. Clair teaches a computer system for configuring data deduplication, the system comprising: a processor and a memory, communicatively coupled to the processor, wherein the memory stores program code portions that, when executed, enable the processor to: receive source data (Para 0041, aggregation is a process that collects all data from one or more disparate sources that could belong to the patient in question); analyze the source data, wherein analyzing the source data includes generating data profiling statistics from the source data and classifying attributes of the source data(Para 0040, In exemplary embodiments these patient-centric data processes are combined to correctly identify the patient and all the data from all sources that belong to that patient, “patient-centric data processes” is analogous to “data profiling statistics”); determine at least one data domain associated with the source data based, at least in part, on the data profiling statistics, the classified attributes, and ontology data(Para 0044, a validation process is used in some embodiments to confirm that the patient in question is the correct patient, “correct patient” is analogous to “data domain”); and determine, for the at least one data domain associated with the source data, a number of required matching algorithms for a data matching engine to execute data deduplication within the source data (Para 0092, the group of records for a particular patient may be processed to eliminate duplicate records, the action of “eliminate duplicate records” requires the computer system determining a number of required matching algorithms in order to perform the action).
Regarding claim 21, St. Clair teaches the computer system of claim 12, wherein the data profiling statistics and a classification of the source data includes one or more of: technical metadata of the received source data; data quality metric values per attribute of the source data; relationship descriptors between sets of the source data; and a data classification per attribute, and thereby a linkage of the attributes and their relationships (Para 0015, these patient-centric data processes correctly identify the patient and all the data from all sources that belong to that patient).
Regarding claim 23, St. Clair teaches a computer program product for configuring data deduplication, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions including instructions to: receive source data (Para 0041, aggregation is a process that collects all data from one or more disparate sources that could belong to the patient in question); analyze the source data, wherein analyzing the source data includes generating data profiling statistics from the source data and classifying attributes of the source data (Para 0040, In exemplary embodiments these patient-centric data processes are combined to correctly identify the patient and all the data from all sources that belong to that patient, “patient-centric data processes” is analogous to “data profiling statistics”); determine at least one data domain associated with the source data based, at least in part, on the data profiling statistics, the classified attributes, and ontology data (Para 0044, a validation process is used in some embodiments to confirm that the patient in question is the correct patient, “correct patient” is analogous to “data domain”); and determine, for the at least one data domain associated with the source data, a number of required matching algorithms for a data matching engine to execute data deduplication within the source data (Para 0092, the group of records for a particular patient may be processed to eliminate duplicate records, the action of “eliminate duplicate records” requires the computer system determining a number of required matching algorithms in order to perform the action).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 3, 11, 13, 14, and 22 are rejected under 35 U.S.C. 103 as being unpatentable over St. Clair in view of Jain (US 8515987 B1) hereafter Jain.
Regarding claim 2, St. Clair teaches the computer-implemented method of claim 1. However, St. Clair does not appear to explicitly teach further comprising: determining, for each determined required matching algorithm, a mapping of attributes of the source data to matching engine algorithm functions. In analogous art, Jain teaches determining, for each determined required matching algorithm, a mapping of attributes of the source data to matching engine algorithm functions (Para 17, Match 300 creates candidate matches between the input database information records and assigns a probability score to the match, “Match 300” is analogous to “matching engine algorithm functions”). It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the system of St. Clair to include determining, for each determined required matching algorithm, a mapping of attributes of the source data to matching engine algorithm functions, as taught by Jain. One of ordinary skill in the art would be motivated to modify the system of St. Clair to include determining, for each determined required matching algorithm, a mapping of attributes of the source data to matching engine algorithm functions in order to consolidate database information, as taught by Jain (Para 17, FIG. 3 illustrates a process for consolidation of database information). 
Regarding claim 3, St. Clair in view of Jain hereafter St. Clair-Jain teaches the computer-implemented method of claim 2, wherein the matching engine algorithm functions are selected from the group consisting of: determining at least one standardizer considering a plurality of source data attributes (St. Clair, Para 0056, The data are preferably assembled in a patient data model following a standard format); determining at least one comparison function considering a plurality of source data attributes; and determining bucket groups of source data records. 
Regarding claim 11, St. Clair teaches the computer-implemented method of claim 1 as shown above. St. Clair-Jain goes on to teach wherein the data matching engine is at least one of a probabilistic data matching engine (Jain, Para 17, Match 300 creates candidate matches between the input database information records and assigns a probability score to the match), a machine-learning based data matching engine and a deterministic data matching engine.
Regarding claim 13, St. Clair teaches the computer system of claim 12 as shown above. St. Clair-Jain goes on to teach wherein the program code portions further enable the processor to: determine, for each determined required matching algorithm, a mapping of attributes of the source data to matching engine algorithm functions (Jain, Para 17, Match 300 creates candidate matches between the input database information records and assigns a probability score to the match, “Match 300” is analogous to “matching engine algorithm functions”).
Regarding claim 14, St. Clair-Jain teaches the computer system of claim 13, wherein the matching engine functions are selected from the group consisting of: determining at least one standardizer considering a plurality of source data attributes (St. Clair, Para 0056, The data are preferably assembled in a patient data model following a standard format); determining at least one comparison function considering a plurality of source data attributes; and determining bucket groups of source data records.
Regarding claim 22, St. Clair-Jain teaches the computer system of claim 12, wherein the data matching engine is a probabilistic data matching engine (Jain, Para 17, Match 300 creates candidate matches between the input database information records and assigns a probability score to the match), a machine-learning based data matching engine or a deterministic data matching engine.
Claims 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over St. Clair in view of Kao (US 20100070505 A1) hereafter Kao.
Regarding claim 4, St. Clair teaches the computer-implemented method of claim 1 as shown above. However, St. Clair does not appear to explicitly teach wherein determining the at least one data domain associated with the source data is further based, at least in part, on: configuring, for each detectable data domain, a domain detection threshold value for the data matching engine, the domain detection threshold value being indicative of a domain being detected as a separate domain, configuring a sub-class threshold value for a detection of the domain, the sub-class threshold value being indicative of a minimum number of detected sub-classes in a record of the source data; and determining a confidence threshold value indicative of an average value of confidence values of detected sub-classes to determine a detected class. In analogous art, Kao teaches configuring, for each detectable data domain, a domain detection threshold value for the data matching engine (Para 0053, a value indicating the probability of a particular folder containing data records of a selected data classification type may be displayed only if the probability is greater than a threshold value). It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of St. Clair to include configuring a domain detection threshold value for the data matching engine, as taught by Kao. One of ordinary skill in the art would be motivated to modify the method of St. Clair to include configuring a domain detection threshold value for the data matching engine in order to help increase the efficiency of the data classification efforts, as taught by Kao (Para 0053, The threshold value may be used to help increase the efficiency of the administrator's data classification efforts).
Regarding claim 15, St. Clair teaches the computer system of claim 12 as shown above. St. Clair in view of Kao hereafter St. Clair-Kao goes on to teach wherein the program code portions that enable the processor to determine the at least one data domain further enable the processor to: configure, for each detectable data domain, a domain detection threshold value for the data matching engine, the domain detection threshold value being indicative of a domain being detected as a separate domain(Kao, Para 0053, a value indicating the probability of a particular folder containing data records of a selected data classification type may be displayed only if the probability is greater than a threshold value); configure a sub-class threshold value for a detection of the domain, the sub-class threshold value being indicative of a minimum number of detected sub-classes in a records of source data; and determine a confidence threshold value indicative of an average value of confidence values of detected sub-classes to determine a detected class.
Claims 5 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over St. Clair in view of Kao further in view of Jain
Regarding claim 5, St. Clair-Kao teaches the computer-implemented method of claim 4 as shown above. However, St. Clair-Kao does not appear to explicitly teach determining a detected data domain if the required matching algorithm of the data matching engine has to be configured. In analogous art, Jain teaches further comprising: determining a detected data domain (Jain, Para 19, The candidate match groups) if the required matching algorithm of the data matching engine has to be configured (Jain, Para 17, Database information is input into match 300, inputting database information is analogous to configuring the data matching engine). It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the method of St. Clair-Kao to include determining a detected data domain if the required matching algorithm of the data matching engine has to be configured, as taught by Jain. One of ordinary skill in the art would be motivated to modify the method of St. Clair-Kao to include determining a detected data domain if the required matching algorithm of the data matching engine has to be configured in order to consolidate database information, as taught by Jain (Para 17, FIG. 3 illustrates a process for consolidation of database information).
Regarding claim 16, St. Clair-Kao in view of Jain teaches the computer system of claim 15, wherein the program code portions further enable the processor to: determine a detected data domain (Jain, Para 17, candidate match groups) if the required matching algorithm of the data matching engine has to be configured (Jain, Para 17, Database information is input into match 300, inputting database information is analogous to configuring the data matching engine).
Claims 6-9 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over St. Clair in view of Goldenberg (US 20090089630 A1) hereafter Goldenberg.
Regarding claim 6, St. Clair teaches the computer-implemented method of claim 1 as shown above. However, St Clair does not appear to explicitly teach further comprising: configuring an auto-link threshold value depending on at least one of a detected false positive and/or a detected false negative result during a matching of records. In analogous art, Goldenberg teaches configuring an auto-link threshold value depending on at least one of a detected false positive and/or a detected false negative result during a matching of records (Para 0022, The user to analyze and see how the configured autolink thresholds affect system performance (e.g., false negatives or false positives, throughput, etc.)). It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to modify the methods of St. Clair to include configuring an auto-link threshold value depending on at least one of a detected false positive and/or a detected false negative result during a matching of records, as taught by Goldenberg. One of ordinary skill in the art would be motivated to modify the methods of St. Clair to include configuring an auto-link threshold value depending on at least one of a detected false positive and/or a detected false negative result during a matching of records in order to allow the system to be configured according to a user's desire, as taught by Goldenberg (Para 0013, the system may be configured according to a user's desire).
St. Clair in view of Goldenberg hereafter St. Clair-Goldenberg goes on to teach configuring a clerical review rate threshold value depending on a number of clerical tasks to be performed (Goldenberg, Para 0181, The user may adjust the CR threshold to yield a fixed number of tasks).
Regarding claim 7, St. Clair-Goldenberg teaches the computer-implemented method of claim 6, further comprising: determining two records to be duplicates if their combined matching score value is greater than the auto-link threshold value (Goldenberg, Para 0063, if the overall score is greater than the autolink threshold the records may be linked). 
Regarding claim 8, St. Clair-Goldenberg teaches the computer-implemented method of claim 6, further comprising: determining two records to not be duplicates if their combined matching score value is smaller than the clerical review rate threshold value (Goldenberg, Para 0186, The candidate clerical-review threshold is set based upon the desired false-negative rate. For example, if it is desired for 95% of the duplicates to score above our clerical-review threshold, the default is set at 0.05)
Regarding claim 9, St. Clair-Goldenberg teaches the computer-implemented method of claim 6, further comprising: determining two records to be assessed clerically if the two records are determined to be duplicates (Goldenberg, Para 0024, Based upon a clerical review of a set of linked data records in which a user may determine whether records have been correctly or incorrectly linked).
Regarding claim 17, St. Clair-Goldenberg  teaches the computer system of 12, wherein the program code portions further enable the processor to: configure an auto-link threshold value depending on detected false positive and/or false negative results of the matching of records (Goldenberg, Para 0022, The user to analyze and see how the configured autolink thresholds affect system performance (e.g., false negatives or false positives, throughput, etc.)); and configure a clerical review rate threshold value depending on a number of clerical tasks to be performed (Goldenberg, Para 0181, The user may adjust the CR threshold to yield a fixed number of tasks).
Regarding claim 18, St. Clair-Goldenberg teaches the computer system of claim 16, wherein the program code portions further enable the processor to: determine two records to be duplicates if their combined matching score value is greater than the auto-link threshold value (Goldenberg, Para 0063, if the overall score is greater than the autolink threshold the records may be linked).
Regarding claim 19, St. Clair-Goldenberg teaches the computer system of claim 16, wherein the program code portions further enable the processor to: determine two records to not be duplicates if their combined matching score value is smaller than the clerical review rate threshold value (Goldenberg, Para 0186, The candidate clerical-review threshold is set based upon the desired false-negative rate. For example, if it is desired for 95% of the duplicates to score above our clerical-review threshold, the default is set at 0.05)
Regarding claim 20, St. Clair-Goldenberg teaches the computer system of claim 16, wherein the program code portions further enable the processor to: determine two records to be assessed clerically if the two records are determined to be duplicates (Goldenberg, Para 0024, Based upon a clerical review of a set of linked data records in which a user may determine whether records have been correctly or incorrectly linked).
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BROOKS T HALE whose telephone number is (571)272-0160. The examiner can normally be reached Monday - Friday 9:00 - 5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached on (571) 270-3750. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/B.T.H./Examiner, Art Unit 2166                                                                                                                                                                                                        /MARK D FEATHERSTONE/Supervisory Patent Examiner, Art Unit 2166