Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to amendment
2. 	This office action is in response to an amendment filed on 11/22/2021. The amendment has been entered and considered. 

3. 	Claims 1, 3, 13 and 20 have been amended. Claims 1-20 are now pending in this office action. 

4. 	Applicant’s arguments with respect to the rejection of claims under 35 U.S.C. § 102 (a)(i) and 103(a) have been fully considered but are not persuasive, thus necessitated the new ground of rejection as presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 

Applicant’s arguments on page 10 recites “storing naming rules for selected combinations of the attributes of the data items, wherein the naming rules are based on the plurality of attributes;" and "prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection." However, prioritizing a naming rule based on an importance (e.g., by determining which attributes may be more likely to result in entity detection) is not disclosed by either Meuhlich or Tai. Further, the Chaudhuri, 
Examiner respectfully disagrees as Tai et al discloses “storing naming rules for selected combinations of the attributes of the data items, wherein the naming rules are based on the plurality of attributes;” (Paragraphs [0036], [0037] A naming rule specifies the attributes or combination of attributes used to indicate characteristics or properties of the resource and that can be used to uniquely identify a resource (i.e., naming rules are based on the plurality of attributes to uniquely identify a resource). Also see Paragraph for clarity Paragraph [0074] Each naming rule uses a different set of attributes to derive a valid name….A rule is determined to have the highest priority if it has been selected as such, by a CMDB manager, either according to complying with pre-set naming rules or by the CMDB manager manually choosing a particular rule to have the highest priority for a particular situation or for a particular resource).
Tai further teaches, and "prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection." (Paragraph [0048] The order of the attributes in the URI string is defined and determined by the naming rule (i.e., prioritizing the naming rules by applying the sequence /order of the attributes in a URI string).
Therefore, the rejection under 35 U.S.C. 103 is maintained for claim 1, 13 and 20.

Claim Rejections - 35 U.S.C. § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

5. 	Claims 1-4, 8, 13-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over MUEHLICH; Christoph (US 20170011088 A1) in view of Tai; Ling-Ching W (US 20120096163 A1).		

	Regarding independent claim 1, MUEHLICH; Christoph (US 20170011088 A1) teaches, a computer-implemented method for unambiguously identifying entities in a database system, the method comprising: storing data items in a table of a database, wherein the data items are stored as records comprising a plurality of attributes  (Paragraph [0012] A first aspect of the invention relates to a method for finding doublets in a database. As defined above, a doublet or duplicate may be two or more records in the database, which refer to the same logical data, such as a specific entity, person, device, machine, etc. These two or more records need not contain the same data, but only similar data, due to, for example, misspellings and/or, missing field data. [0014] the data in the database may be organized in database tables comprising a plurality of records being organized in table fields); 
	determining a hash value for each of the selected combinations of the attributes of the data items, and identifying duplicate data items using the determined hash values….(Paragraph [0015] According to an embodiment of the invention, the method comprises: (in a first pass) calculating hash values for at least two field groups for records in the database, wherein a field group comprises at least two fields, and the hash value of a field group for a record, which is based on the values in the at least two fields of the respective field group stored in the respective record; storing these hash values for each record; and (in a second pass), identifying doublets by comparing the hash values of two records, which were calculated during the first pass, wherein two records are a doublet, when the hash values of at least one field group are equal. [0017] the hash values are based on field groups, i.e. one hash value is not calculated from a value in a single field but from at least two fields. For example, a first field group may comprise the fields "name", "e-mail" and a second field group may comprise the fields "identification card number" and "country". A field group may be adapted for uniquely identifying the entity (person, device, etc.) represented by the record. The fields may be table fields of a database table). 
MUEHLICH et al fails to explicitly teach, storing naming rules for selected combinations of the attributes of the data items, wherein the naming rules are based on the plurality of attributes; prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection; … the prioritized naming rules.
storing naming rules for selected combinations of the attributes of the data items, wherein the naming rules are based on the plurality of attributes (Paragraphs [0036], [0037] A naming rule specifies the attributes or combination of attributes used to indicate characteristics or properties of the resource and that can be used to uniquely identify a resource (i.e., naming rules are based on the plurality of attributes to uniquely identify a resource). Also see Paragraph for clarity Paragraph [0074] Each naming rule uses a different set of attributes to derive a valid name….A rule is determined to have the highest priority if it has been selected as such, by a CMDB manager, either according to complying with pre-set naming rules or by the CMDB manager manually choosing a particular rule to have the highest priority for a particular situation or for a particular resource);
prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection; … the prioritized naming rules (Paragraph [0048] The order of the attributes in the URI string is defined and determined by the naming rule (i.e., prioritizing the naming rules by applying the sequence /order of the attributes in a URI string).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by storing naming rules for selected combinations of the attributes of the data items, wherein the naming rules are based on the plurality of attributes; prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order 
One of the ordinary skill in the art would have been motivated to make this modification where each product may maintain its own separate data that is related to the set of resources it manages, and each product may comprise its own set of naming rules for naming or identifying a resource. As the products may comprise different naming rules, these products may also identify the same resource differently in many cases. As a result of the CMDB receiving resource identity data from various resource management products that identify the same resource differently, resource identity data is frequently duplicated in a CMDB, both logically and physically., as taught by Tai et al (Paragraph [0006]).

Regarding dependent claim 2, MUEHLICH et al and Tai et al teach, the method according to claim 1. 
MUEHLICH et al further teaches, wherein the database system is a relational database system (Paragraph [0014] the data in the database may be organized in database tables comprising a plurality of records being organized in table fields. The database may be a relational database).

Regarding dependent claim 3, MUEHLICH et al and Tai et al teach, the method according to claim 1. 
Tai et al further teaches, wherein the database system is a configuration management database  that underlies a specific internal organization. (Paragraph 

Regarding dependent claim 4, MUEHLICH et al and Tai et al teach, the method according to claim 1. 
MUEHLICH et al further teaches, further comprising merging the identified duplicate data items by maintaining the determined hash values as a multi-valued key for a merged data item (Paragraph [0022] According to an embodiment of the invention, the records are stored in a first table. In general, the database may comprise a raw data table, comprising the new data to be included into the database and a cleansed data table with records already removed from doublets. With this method, the new data has to be included into the cleansed data without generation doublets. To achieve this, the hash values may be stored in a separate (matching) table together with a unique ID for the corresponding record (i.e., maintaining a multi-valued key for the merged item for a determined hash value in a separate table). All the tables, the raw data table, the cleansed data table and the matching table may comprise a unique ID field, uniquely identifying each record. That means a doublet (two duplicate records) may comprise several unique IDs). 

Regarding dependent claim 8, MUEHLICH et al and Tai et al teach, the method according to claim 1. 
further comprising: using a create SQL statement …(Paragraph [0014] the data in the database may be organized in database tables comprising a plurality of records being organized in table fields. The database may be a relational database. Furthermore, the method may be performed by the database itself, for example by a SQL program executed by the database. (SQL includes data query, data manipulation such as (insert, update and delete), data definition (schema creation and modification), and data access control)
Tai et al further teaches,… creating of the naming rule and its related priority (Paragraphs [0036], [0037] A naming rule specifies the attributes or combination of attributes used to indicate characteristics or properties of the resource and that can be used to uniquely identify a resource. In addition, a resource may be identifiable using more than one naming rule. Each naming rule has an associated priority. The priority of a naming rule is based on the level of uniqueness the naming rule provides in identifying resources. For instance, the combination of Manufacturer, Model, and Serial Number (MMS) attributes is more unique when identifying a resource than an IP address or a MAC address, which may be often only temporarily assigned to a particular resource).

Regarding independent claim 13, MUEHLICH; Christoph (US 20170011088 A1) teaches, a computer system for unambiguously identifying entities in the database system, the computer system comprising: one or more computer processors (Paragraph [0035] A further aspect of the invention relates to a computer program for finding doublets in a database table, which, when being executed by a processor), one or more computer-readable storage media, and program instructions stored on the one or more of the computer-readable storage media for execution by at least one of the one or more processors capable of performing a method (Paragraph [0036], [0038] A further aspect of the invention relates to a computer-readable medium in which such a computer program is stored), the method comprising: storing data items in a table of a database, wherein the data items are stored as records comprising a plurality of attributes (Paragraph [0012] A first aspect of the invention relates to a method for finding doublets in a database. As defined above, a doublet or duplicate may be two or more records in the database, which refer to the same logical data, such as a specific entity, person, device, machine, etc. These two or more records need not contain the same data, but only similar data, due to, for example, misspellings and/or, missing field data. [0014] the data in the database may be organized in database tables comprising a plurality of records being organized in table fields); 
determining a hash value for each of the selected combinations of the attributes of the data items, and identifying duplicate data items using the determined hash values….
MUEHLICH et al fails to explicitly teach, storing naming rules for selected combinations of the attributes of the data items wherein the naming rules are based on the plurality of attributes; Page 4 of 12Application No.: 16/839,200prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection; … and the prioritized naming rules.
Tai; Ling-Ching W (US 20120096163 A1) teaches, storing naming rules for selected combinations of the attributes of the data items wherein the naming rules are based on the plurality of attributes (Paragraphs [0036], [0037] A naming rule specifies the attributes or combination of attributes used to indicate characteristics or properties of the resource and that can be used to uniquely identify a resource (i.e., naming rules are based on the plurality of attributes to uniquely identify a resource). Also see Paragraph for clarity Paragraph [0074] Each naming rule uses a different set of attributes to derive a valid name….A rule is determined to have the highest priority if it has been selected as such, by a CMDB manager, either according to complying with pre-set naming rules or by the CMDB manager manually choosing a particular rule to have the highest priority for a particular situation or for a particular resource);
Page 4 of 12Application No.: 16/839,200prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection; … and the prioritized naming rules (Paragraph [0048] The order of the attributes in the URI string is defined and determined by the naming rule (i.e., prioritizing the naming rules by applying the sequence /order of the attributes in a URI string).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by storing naming rules for selected combinations of the attributes of the data items wherein the naming rules are based on the plurality of attributes; Page 4 of 12Application No.: 16/839,200prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection; … and the prioritized naming rules, as taught by Tai et al (Paragraphs [0036], [0037, [0048]).


Regarding dependent claim 14, MUEHLICH et al and Tai et al teach, the computer system according to claim 13. 
MUEHLICH et al further teaches, wherein the database system is a relational database system (Paragraph [0014] the data in the database may be organized in database tables comprising a plurality of records being organized in table fields. The database may be a relational database).

Regarding dependent claim 15, MUEHLICH et al and Tai et al teach, the computer system according to claim 13. 
Tai et al further teaches, wherein the database system is a configuration management database (Paragraph [0036] Along with representing and storing relationships between resources, an important purpose of a CMDB (configuration management database) is to provide a correlation mechanism between resources).

Regarding dependent claim 16, MUEHLICH et al and Tai et al teach, the computer system according to claim 13. 
MUEHLICH et al further teaches, further comprising merging the identified duplicate data items by maintaining the determined hash values as a multi-valued key for a merged data item (Paragraph [0022] According to an embodiment of the invention, the records are stored in a first table. In general, the database may comprise a raw data table, comprising the new data to be included into the database and a cleansed data table with records already removed from doublets. With this method, the new data has to be included into the cleansed data without generation doublets. To achieve this, the hash values may be stored in a separate (matching) table together with a unique ID for the corresponding record (i.e., maintaining a multi-valued key for the merged item for a determined hash value in a separate table). All the tables, the raw data table, the cleansed data table and the matching table may comprise a unique ID field, uniquely identifying each record. That means a doublet (two duplicate records) may comprise several unique IDs). 

Regarding independent claim 20, MUEHLICH; Christoph (US 20170011088 A1) teaches, a computer program product for unambiguously identifying entities in a database system, the computer program product comprising: one or more non-transitory computer-readable storage media and program instructions stored on the one or more non-transitory computer-readable storage media capable of performing a method, the method comprising: storing data items in a table of a database, wherein the data items are stored as records comprising a plurality of attributes (Paragraph [0012] A first aspect of the invention relates to a method for finding doublets in a database. As defined above, a doublet or duplicate may be two or more records in the database, which refer to the same logical data, such as a specific entity, person, device, machine, etc. These two or more records need not contain the same data, but only similar data, due to, for example, misspellings and/or, missing field data. [0014] the data in the database may be organized in database tables comprising a plurality of records being organized in table fields); 
determining a hash value for each of the selected combinations of the attributes of the data items, and Page 6 of 12Application No.: 16/839,200 identifying duplicate data items using the determined hash values …. (Paragraph [0015] According to an embodiment of the invention, the method comprises: (in a first pass) calculating hash values for at least two field groups for records in the database, wherein a field group comprises at least two fields, and the hash value of a field group for a record, which is based on the values in the at least two fields of the respective field group stored in the respective record; storing these hash values for each record; and (in a second pass), identifying doublets by comparing the hash values of two records, which were calculated during the first pass, wherein two records are a doublet, when the hash values of at least one field group are equal. [0017] the hash values are based on field groups, i.e. one hash value is not calculated from a value in a single field but from at least two fields. For example, a first field group may comprise the fields "name", "e-mail" and a second field group may comprise the fields "identification card number" and "country". A field group may be 
MUEHLICH et al fails to explicitly teach, storing naming rules for selected combinations of the attributes of the data items, wherein the naming rules are based on the plurality of attributes; prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection; … and the prioritized naming rules.
Tai; Ling-Ching W (US 20120096163 A1) teaches, storing naming rules for selected combinations of the attributes of the data items, wherein the naming rules are based on the plurality of attributes (Paragraphs [0036], [0037] A naming rule specifies the attributes or combination of attributes used to indicate characteristics or properties of the resource and that can be used to uniquely identify a resource (i.e., naming rules are based on the plurality of attributes to uniquely identify a resource). Also see Paragraph for clarity Paragraph [0074] Each naming rule uses a different set of attributes to derive a valid name….A rule is determined to have the highest priority if it has been selected as such, by a CMDB manager, either according to complying with pre-set naming rules or by the CMDB manager manually choosing a particular rule to have the highest priority for a particular situation or for a particular resource).
prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection; … and the prioritized naming rules (Paragraph [0048] The order of the attributes in the URI string is defined and determined by the 
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by storing naming rules for selected combinations of the attributes of the data items, wherein the naming rules are based on the plurality of attributes; prioritizing the naming rules by defining a sequence of applying the naming rules or defining an order of the naming rules, depending on an importance of the naming rules for entity detection; … and the prioritized naming rules, as taught by Tai et al (Paragraphs [0036], [0037, [0048]).
One of the ordinary skill in the art would have been motivated to make this modification where each product may maintain its own separate data that is related to the set of resources it manages, and each product may comprise its own set of naming rules for naming or identifying a resource. As the products may comprise different naming rules, these products may also identify the same resource differently in many cases. As a result of the CMDB receiving resource identity data from various resource management products that identify the same resource differently, resource identity data is frequently duplicated in a CMDB, both logically and physically., as taught by Tai et al (Paragraph [0006]).

6. 	Claims 5 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over MUEHLICH; Christoph (US 20170011088 A1) in view of Tai; Ling-Ching W (US 20120096163 A1) and in further view of Chaudhuri, Surajit (US 20040003005 A1).

Regarding dependent claim 5, MUEHLICH et al and Tai et al teach, the method according to claim 4. 
MUEHLICH et al and Tai et al fails to explicitly teach, further comprising merging other data items that are in composite relationship with the identified data items.
Chaudhuri, Surajit (US 20040003005 A1) teaches, further comprising merging other data items that are in composite relationship with the identified data items (Paragraph [0029] Consider the tuples USA and United States in the Country relation in FIG. 2. The state attribute value "MO" appears in tuples in the State relation joining with countries USA and United States, whereas most state attribute values occur only with a single Country tuple. That is, USA and United States co-occur through the state MO. In general, country tuples are associated with sets of State attribute values. And, an unusually significant overlap--called significant co-occurrence through the State relation--between two sets is a good reason for suspecting that the two countries are duplicates).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by providing dimensional hierarchies to develop an efficient, scalable, duplicate data identification process which significantly reduces the number of false positives while detecting a high percentage of duplicates. The exemplary process allows detection of equivalent data in each relation or table within the hierarchies of relations, as taught by Chaudhuri et al (Paragraph [0010]).
One of the ordinary skill in the art would have been motivated to make this modification for reasons of efficiency and scalability, an exemplary embodiment avoids 

Regarding dependent claim 17, MUEHLICH et al, Tai et al and Chaudhuri et al teach, the computer system according to claim 16. 
MUEHLICH et al and Tai et al fails to explicitly teach, further comprising merging other data items that are in composite relationship with the identified data items.  
Chaudhuri, Surajit (US 20040003005 A1) teaches, further comprising merging other data items that are in composite relationship with the identified data items (Paragraph [0029] Consider the tuples USA and United States in the Country relation in FIG. 2. The state attribute value "MO" appears in tuples in the State relation joining with countries USA and United States, whereas most state attribute values occur only with a single Country tuple. That is, USA and United States co-occur through the state MO. In general, country tuples are associated with sets of State attribute values. And, an unusually significant overlap--called significant co-occurrence through the State relation--between two sets is a good reason for suspecting that the two countries are duplicates).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by providing dimensional hierarchies to develop an efficient, scalable, duplicate data identification process which significantly reduces the number of false positives while detecting a high percentage of duplicates. The exemplary process allows detection of equivalent data in each relation or table within the hierarchies of relations, as taught by Chaudhuri et al (Paragraph [0010]).
.


7. 	Claims 6-7 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over MUEHLICH; Christoph (US 20170011088 A1) in view of Tai; Ling-Ching W (US 20120096163 A1) and in further view of Beecham; James Douglas (US 20190318117 A1).

Regarding dependent claim 6, MUEHLICH et al and Tai et al teach, the method according to claim 4. 
MUEHLICH et al and Tai et al fails to explicitly teach, further comprising maintaining a pointer to a same row identifier of one of the merged data items for the determined hash values.
Beecham; James Douglas (US 20190318117 A1) teaches, further comprising maintaining a pointer to a same row identifier of one of the merged data items for the determined hash values (Paragraph [0226] Similar operations may be performed when writing, for example, by grouping data classified as high security according to the value, for example, by sorting the data and then detecting groups of instances in which the values are the same, or storing the data in a hash table and detecting duplicates with hash collisions where the same values are written to the same index of the hash table. In 
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by maintaining a pointer to a same row identifier of one of the merged data items for the determined hash values as taught by Beecham et al (Paragraph [0226]).
One of the ordinary skill in the art would have been motivated to make this modification for expediting the operations by assigning pointers or other types of unique identifiers that are based on the content of the values to which the pointers point, for example, based on cryptographic hash values based solely on the content of the values to which the pointers point. As a result, different instances, of the same value, for example, in different rows or other tuples of a database may correspond to the same pointer. These pointers can be said to be unique identifiers in the sense that they uniquely identify content, in some cases without revealing the semantic information in that content, for instance, with the cryptographic hash identifier, while still having the same unique identifier replicated for multiple instances of that value appearing at multiple rows and a database as taught by Beecham et al (Paragraph [0224]).

Regarding dependent claim 7, MUEHLICH et al and Tai et al teach, the method according to claim 1. 
further comprising: maintaining an index of the table; and maintaining a pointer in a search tree related to the index, such that the pointer points to the same record identifiers of a combined data item.
Beecham et al further teaches, further comprising: maintaining an index of the table; and maintaining a pointer in a search tree related to the index, such that the pointer points to the same record identifiers of a combined data item.   [0226] Similar operations may be performed when writing, for example, by grouping data classified as high security according to the value, for example, by sorting the data and then detecting groups of instances in which the values are the same, or storing the data in a hash table and detecting duplicates with hash collisions where the same values are written to the same index of the hash table. In these examples, some embodiments may then assign the same unique identifier to each instance in the group where this value is the same, and cause that unique identifier, which may serve as a pointer, to be stored in place of those higher-security values in the first remote database).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by maintaining a pointer to a same row identifier of one of the merged data items for the determined hash values as taught by Beecham et al (Paragraph [0226]).
One of the ordinary skill in the art would have been motivated to make this modification for expediting the operations, an index may be maintained in which the pointers are associated with values that indicate whether the values are responsive to certain criteria (e.g., a threshold number of prefix characters or suffix characters), and 

Regarding dependent claim 18, MUEHLICH et al and Tai et al teach, the computer system according to claim 16. 
MUEHLICH et al and Tai et al fails to explicitly teach, further comprising maintaining a pointer to a same row identifier of one of the merged data items for the determined hash values.  
Beecham; James Douglas (US 20190318117 A1) teaches, further comprising maintaining a pointer to a same row identifier of one of the merged data items for the determined hash values (Paragraph [0226] Similar operations may be performed when writing, for example, by grouping data classified as high security according to the value, for example, by sorting the data and then detecting groups of instances in which the values are the same, or storing the data in a hash table and detecting duplicates with hash collisions where the same values are written to the same index of the hash table. In these examples, some embodiments may then assign the same unique identifier to each instance in the group where this value is the same, and cause that unique identifier, which may serve as a pointer, to be stored in place of those higher-security values in the first remote database (i.e., maintaining a pointer to the same unique identifier/row identifier of the merged data items for the determined hash values)).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of 
One of the ordinary skill in the art would have been motivated to make this modification for expediting the operations, an index may be maintained in which the pointers are associated with values that indicate whether the values are responsive to certain criteria (e.g., a threshold number of prefix characters or suffix characters), and embodiments may access this index to identify a subset of pointers for which values are retrieved from the secure datastore as taught by Beecham et al (Paragraph [0223]).


Regarding dependent claim 19, MUEHLICH et al and Tai et al teach, the computer system according to claim 13. 
MUEHLICH et al and Tai et al fails to explicitly teach, further comprising: maintaining an index of the table; and maintaining a pointer in a search tree related to the index, such that the pointer points to the same record identifiers of a combined data item.  
Beecham et al further teaches, further comprising: maintaining an index of the table; and maintaining a pointer in a search tree related to the index, such that the pointer points to the same record identifiers of a combined data item (Paragraph [0226] Similar operations may be performed when writing, for example, by grouping data classified as high security according to the value, for example, by sorting the data and then detecting groups of instances in which the values are the same, or storing the data in a hash table and detecting duplicates with hash collisions where the same values are 
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by maintaining a pointer to a same row identifier of one of the merged data items for the determined hash values as taught by Beecham et al (Paragraph [0226]).
One of the ordinary skill in the art would have been motivated to make this modification for expediting the operations, an index may be maintained in which the pointers are associated with values that indicate whether the values are responsive to certain criteria (e.g., a threshold number of prefix characters or suffix characters), and embodiments may access this index to identify a subset of pointers for which values are retrieved from the secure datastore as taught by Beecham et al (Paragraph [0223]).


8. 	Claims 9-11 are rejected under 35 U.S.C. 103 as being unpatentable over MUEHLICH; Christoph (US 20170011088 A1) in view of Tai; Ling-Ching W (US 20120096163 A1) and in further view of Bodapati; Chandra (US 20100005048 A1).

Regarding dependent claim 9, MUEHLICH et al and Tai et al teach, the method according to claim 1. 
further comprising: using a multi-value primary key for sorting records in the table of the database.
Bodapati; Chandra (US 20100005048 A1) teaches, further comprising: using a multi-value primary key for sorting records in the table of the database (Paragraph [0065] In an embodiment normalization eliminates the noises including but not limited to titles in names and symbols used in some fields. Further, in the embodiment the swapped tokens are normalized and sorted to assign same keys to swapped entries wherein "Bob Gales" and "Gales Bob" are assigned the keys (N1, N2) (i.e., multi value keys are used in sorting the records).
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by providing a method of detecting and removing duplicate records, wherein each record comprises of data relating to a plurality of fields, the method comprising the steps of standardizing data using field specific knowledge base; extracting at least part of one or more related fields of records; applying a matching attribute function to generate keys on the "comparable" field part extracted data; generating record level keys using generated field level keys; clustering the records based on generated record level keys; identifying reference record for each cluster identified; and calculating matching percentage for each record in a cluster with respect to reference record of the cluster as taught by Bodapati (Paragraph [0007]).
One of the ordinary skill in the art would have been motivated to make this modification, the objective of the clustering phase is to minimize the number of comparisons and also ensuring a good accuracy percentage. Similar records using 

Regarding dependent claim 10, MUEHLICH et al and Tai et al teach, the method according to claim 1. 
MUEHLICH et al and Tai et al fails to explicitly teach, wherein a multi-value primary key is used for clustering cluster data on multi-node database engines.
Bodapati; Chandra (US 20100005048 A1) teaches, wherein a multi-value primary key is used for clustering cluster data on multi-node database engines (Paragraph [0066] In an embodiment the record level keys are assigned by the matching attribute generator 245 as illustrated by FIG. 6(F) wherein the Name keys Na=(N1, N2) and Nb=(N3, N4) correspond to the name keys assigned as shown in FIG. 6(C) and the Company keys including Ca, Cb correspond to the keys assigned as shown in FIG. 6(E). The clustering Unit 240 clusters the records with same keys and sends it to the Comparing Unit 250. Also see [0051] the objective of the clustering phase is to minimize the number of comparisons and also ensuring a good accuracy percentage. Similar records using different matching attributes are grouped in the clustering phase and the grouped records are sent to the comparison phase (i.e., comparisons are made on multiple nodes which are grouped together in a cluster). 
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by providing a method of detecting and removing duplicate records, wherein each record comprises of data relating to a plurality of fields, the method 
One of the ordinary skill in the art would have been motivated to make this modification, the objective of the clustering phase is to minimize the number of comparisons and also ensuring a good accuracy percentage. Similar records using different matching attributes are grouped in the clustering phase and the grouped records are sent to the comparison phase as taught by Bodapati (Paragraph [0051]).

Regarding dependent claim 11, MUEHLICH et al, Tai et al and Bodapati et al teach, the method according to claim 9. 
Bodapati et al further teaches, wherein a multi-value primary key is comparable to a single value column data item (Paragraph Fig. 6F [0069] The Comparing Unit 250 compares the records to generate match percentage wherein the reference record is chosen from the cluster and the pair-wise percentage is computed. An adaptive process is involved to designate a record as the reference record. The reference record is selected based on criteria. One such criterion may be that the primary fields of the record should contain maximum information as compared to the other records in the cluster. Primary fields may include important fields like name or company for the example considered .

9. 	Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over MUEHLICH; Christoph (US 20170011088 A1) in view of Tai; Ling-Ching W (US 20120096163 A1) and in further view of CASSIDY; Hugh (US 20170308557 A1).

Regarding dependent claim 12, MUEHLICH et al and Tai et al teach, the method according to claim 1. 
MUEHLICH et al and Tai et al fails to explicitly teach, further comprising collecting statistical database data for data blocks for single-valued primary keys and multi-valued primary keys.
CASSIDY; Hugh (US 20170308557 A1) further comprising collecting statistical database data for data blocks for single-valued primary keys and multi-valued primary keys (Paragraph [0034] The profiling of data relates to analysis of data with respect to statistical properties of data distribution, format of data, quality of data, and the like. The profiling of data can provide information regarding valid addresses, 
Therefore it would have been obvious to one of the ordinary skill in the art before the effective filing date of the claimed invention, to have modified the teachings of MUEHLICH et al by providing data relates to analysis of data with respect to statistical properties of data distribution, format of data, quality of data, and the like. The profiling of data can provide information regarding valid addresses, missing fields and can also be used to identify problems associated with stored data like wrong values in the fields as taught by CASSIDY (Paragraph [0034]).
One of the ordinary skill in the art would have been motivated to make this modification, the profiling of data is used to extract fields that can be used to filter out unnecessary (also known as `garbage`) records. Examples of garbage records may include records that contain dummy business names, addresses or phone numbers. Additionally, the records which include exact duplication across multiple fields are also indicative of garbage. In an embodiment, a unique identity, for example a signature, is created for each record. In an embodiment, identifying garbage field is a result of a pre-defined training exercise. The profiling of data relates to analysis of data with respect to 

Conclusion
Applicant’s amendments/arguments necessitated the rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
 	Any inquiry concerning this communication or earlier communications from the examiner should be directed to SUMAN RAJAPUTRA whose telephone number is (571) 272-4669. The examiner can normally be reached between 8:00 AM - 5:00 PM. 

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).

/S. R./ 
Examiner, Art Unit 2164

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164