DETAILED ACTION

Remarks
The present application is being examined under the pre-AIA  first to invent provisions.
The following is a quotation of 35 U.S.C. 112(f):
 (f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

Use of the word “means” (or “step for”) in a claim with functional language creates a rebuttable presumption that the claim element is to be treated in accordance with 35 U.S.C. § 112(f).  The presumption that § 112(f) is invoked is rebutted when the function is recited with sufficient structure, material, or acts within the claim itself to entirely perform the recited function. 
Absence of the word “means” (or “step for”) in a claim creates a rebuttable presumption that the claim element is not to be treated in accordance with 35 U.S.C. § 112(f).  The presumption that § 112(f) is not invoked is rebutted when the claim element recites function but fails to recite sufficiently definite structure, material or acts to perform that function. 
Claim elements in this application that use the word “means” (or “step for”) are presumed to invoke § 112(f) except as otherwise indicated in an Office action.  Similarly, claim elements that do not use the word “means” (or “step for”) are presumed not to invoke § 112(f) except as otherwise indicated in an Office action.
Claim 20 invokes § 112(f) due to the recitation of “means for receiving data records…” and “means for processing the received data records…” The first of these two limitations, “means for receiving…”, is interpreted as incorporating the appropriate hardware disclosed in the instant specification for fulfilling the function of receiving data records, i.e. a programmable computer system comprising at least one input device or port (see para. 0284 of the published specification). The second of the two limitations, “means for processing…”, is interpreted as incorporating the appropriate hardware disclosed in the instant specification for fulfilling the function of processing the received data records, i.e. a programmable computer system including at least one processor and at least one data storage system (see para. 0284 of the published specification).

Response to Amendment
This communication is in response to the amendment filed on 17 December 2021.
Claim 13 is canceled.
Claims 1 and 18-20 are amended.
Claims 1-12 and 14-20 have been examined. 

Response to Arguments
In response to Applicant’s remarks filed on 17 December 2021:
a.	Rejection of claim 19 under 35 U.S.C. 101 is withdrawn in view of Applicant’s amendment.


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the claims at issue are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on a nonstatutory double patenting ground provided the reference application or patent either is shown to be commonly owned with this application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/forms/. The filing date of the application in which the form is filed  determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to http://www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-20 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-21 of U.S. Patent No. 10,572,511 B2.
Claim 1 of the examined application (Application No. 16/779743) is mapped to claim 1 of U.S. Patent No. 10,572,511 B2 in the following table:
Application No. 16/779743
U.S. Patent No. 10,572,511 B2
1. A method, including:
1. A method, including:
receiving data records, the received data records each including one or more values in one or more fields; and
receiving data records, the received data records each including one or more values in one or more fields; and
processing the received data records to identify a matched data cluster to associate with each received data record, the processing including,
processing the received data records to identify a matched data cluster to associate with each received data record, the processing including:
for selected data records from the received data records, generating a query from the one or more values included in the selected data record;
for at least one selected data record from the received data records, generating a first query from a first set of one or more values included in the selected data record including identifying tokens that each include a representation of at least one value or fragment of a value in a field or a combination of fields of the selected record and generating a second query from a second set of one or more values included in the selected data record, where the second set of one or more values is different from the first set of one or more values;
identifying one or more candidate data records from the received data records using the query;
identifying a first set of one or more candidate data records from the received data records using the first query;

identifying a second set of one or more candidate data records from the received data records using the second query, the second set of one or more candidate data records partially overlapping the first set of one or more candidate data records;

determining a third set of one or more candidate data records as a Boolean combination of the first set of one or more candidate data records and the second set of one or more candidate data records;
determining that the selected data record satisfies a cluster membership criterion for a plurality of existing data clusters;
determining whether or not the selected data record satisfies a cluster membership criterion for at least one candidate data cluster of one or more existing data clusters containing one or more candidate data records from at least one of the first set of one or more candidate data records or the second set of one or more candidate data records, the determining including applying the cluster membership criterion to the third set of one or more candidate data records; and
selecting the matched data cluster for the selected data record from the plurality of existing data clusters; and
selecting the matched data cluster from among one or more candidate data clusters,

or initializing the matched data cluster with the selected data record if the selected data record does not satisfy a cluster membership criterion for any of the existing data clusters.
providing an indication that the selected data record satisfies the cluster membership criterion for the plurality of existing clusters and of the selected matched data cluster to a user.


	
All limitations of claim 1 of the examined application are present in claim 1 of the U.S. Patent, except for the final limitation, i.e. “providing an indication that the selected data record satisfies the cluster membership criterion for the plurality of existing clusters and of the selected matched data cluster to a user.” However, U.S. Patent Application Publication No. 20060242147 A1 to Gehrking et al., hereinafter referred to as Gehrking, teaches this subject matter (see Gehrking para. 0103 and Fig. 19: a given document, e.g. a website, is clustered into candidate clusters, and the results are presented to a user to approve/disapprove each candidate cluster membership). It would have been obvious to one having ordinary skill in the art at the time of the invention to have modified claim 1 of the examined application to include the teachings of Gehrking because it enables a user to manually approve/disapprove cluster memberships (see Gehrking para. 0103 and Fig. 19), allowing review by a human expert to enhance quality control. Therefore, claim 1 of the examined application would have been obvious over the reference claim.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-12 and 14-20 are rejected under pre-AIA  35 U.S.C. 112, first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor(s), at the time the application was filed, had possession of the claimed invention.
As to independent claims 1 and 18-20, these claims have been amended to recite the following newly added limitations (emphasis added): 
“maintaining an association of data records with respective data clusters, each data record belonging to a single data cluster of the respective data clusters, the maintaining including processing the received data records to identify a matched data cluster to associate with each received data record, the processing including,” and
“receiving, from a user, an instruction to modify an association of a particular received data record of the received data records with its matched data cluster, the modification including changing the association of the particular received data record from the matched data cluster to another data cluster of the plurality of existing data clusters that satisfy the cluster membership criterion.”

On page 10 of applicants remarks dated 17 December 2021, Applicant states that support for these newly added limitations can be found in original claim 13 and paragraphs [034]-[035] and [096]-[098] of the published specification.  However, neither original claim 13 nor the cited paragraphs make any mention of “the modification including changing the association of the particular received data record from the matched data cluster to another data cluster of the plurality of existing data clusters that satisfy the cluster membership criterion,” nor is it apparent how Applicant’s disclosure supports this subject matter.  Hence, these newly-added limitations are deemed to introduce new matter.

As to claims 2-12 and 14-17, they depend from claim 1 and therefore inherit its deficiencies.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
 (a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-9, 11, 12, and 15-20 are rejected under 35 U.S.C. 103(a) as being unpatentable over Anderson et al. (U.S. Patent Application Publication No. 20100106724 A1, hereinafter referred to as Anderson) in view of Gehrking et al. (U.S. Patent Application Publication No. 20060242147 A1, hereinafter referred to as Gehrking) and Bhattacharya et al. (U.S. Patent Application Publication No. 20120072421 A1, hereinafter referred to as Bhattacharya).
As to claim 1, Anderson teaches a method including:
receiving data records, the received data records each including one or more values in one or more fields (see Anderson para. 0075: dataset comprising records, each record having values for respective fields) and
maintaining an association of data records with respective data clusters, each data record belonging to a single data cluster of the respective data clusters (see Anderson para. 0037: the system forms clusters of data elements, each data element belonging to one cluster), the maintaining including, processing the received data records to identify a matched data cluster to associate with each received data record (see Anderson para. 0104 and Fig. 2A: clustering data records), the processing including,
for selected data records from the received data records, generating a query from the one or more values included in the selected data record (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Anderson para. 0104: field values of a data record are used to score the data record and compare it to existing cluster members; Note: field values of a data record, as taught by Anderson, correspond the claimed “query”);
identifying one or more candidate data records from the received data records using the query (see Anderson para. 0104: field values of a data record are used to score the data record and compare it to cluster members; Note: Anderson’s cluster members correspond to candidate data records);
determining that the selected data record satisfies a cluster membership criterion for a plurality of existing data clusters (see Anderson para. 0104: comparison test used to determine cluster membership);
selecting the matched data cluster for the selected data record from the plurality of existing data clusters (see Anderson para. 0104: when the comparison test yields a score above a threshold, the corresponding data record is deemed a member of the cluster).
Anderson does not appear to explicitly disclose providing an indication that a selected data record satisfies a cluster membership criterion for a plurality of existing clusters and of a selected matched data cluster to a user.
However, Gehrking teaches:
providing an indication that a selected data record satisfies a cluster membership criterion for a plurality of existing clusters and of a selected matched data cluster to a user (see Gehrking para. 0103 and Fig. 19: a given document, e.g. a website, is clustered into candidate clusters, and the results are presented to a user to approve/disapprove each candidate cluster membership).
It is important to avoid mistakes or inaccuracies that may degrade data quality and result in erroneous clustering/classification (see Anderson para. 0004). Therefore, it would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson to include the teachings of Gehrking because it enables a user to manually approve/disapprove cluster memberships (see Gehrking para. 0103 and Fig. 19), allowing review by a human expert to enhance quality control.
Anderson as modified by Gehrking does not appear to explicitly disclose receiving, from a user, an instruction to modify an association of a particular received data record of received data records with its matched data cluster, the modification including changing an association of the particular received data record from a matched data cluster to another data cluster of a plurality of existing data clusters.
However, Bhattacharya teaches receiving, from a user, an instruction to modify an association of a particular received data record of received data records with its matched data cluster, the modification including changing an association of the particular received data record from a matched data cluster to another data cluster of a plurality of existing data clusters (see Bhattacharya para. 0023 and 0029: re-assignment of data instance from one existing cluster to another) that satisfy a cluster membership criterion (see Bhattacharya para. 0021: cluster membership conditions).
It is widely accepted that clustering performed without supervision often does not lead to meaningful insights for the user and that supervised clustering may improve clustering results (see Bhattacharya paragraphs 0001 and 0019). It would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson as modified by Gehrking to include the teachings of Bhattacharya because it enables the user to supervise and guide the clustering process, resulting in significantly improved clustering performance (see Bhattacharya para. 0001 and 0019).

As to claim 2, Anderson as modified by Gehrking and Bhattacharya teaches wherein generating the query includes identifying tokens, wherein each of the tokens includes a value that is found in a field set, and wherein the field set includes a field of the selected data record (see Anderson para. 0298: single words are identified from a multi-word field; Note: Anderon’s single words correspond to the claimed “tokens”).

As to claim 3, Anderson as modified by Gehrking and Bhattacharya teaches wherein generating the query includes identifying tokens, wherein each of the tokens includes a fragment of a value that is found in a field set that includes at least one field of the selected data record (see Anderson para. 0298: single words are identified from a multi-word field; Note: Anderon’s single words correspond to the claimed “tokens”).

As to claim 4, Anderson as modified by Gehrking and Bhattacharya teaches wherein processing the received data records to identify a matched data cluster to associate with each received data record further includes sorting an initial set of the received data records based on a distinguishability criterion, wherein the distinguishability criterion determines a degree to which a value included in a particular data record distinguishes that particular data record from other data records (see Anderson para. 0166: records are ordered based on inverse document frequency, which is a measure of how discriminating a particular keyword is; Note: Anderson’s inverse document frequency corresponds to the claimed “distinguishability criterion”).

As to claim 5, Anderson as modified by Gehrking and Bhattacharya teaches wherein selecting the matched data cluster for the selected data record from the plurality of existing data clusters includes comparing the selected data record to a representative data record from an existing data cluster, based on the comparison, calculating a comparison score (see Anderson para. 0104: field values of a data record are used to score the data record and compare it to cluster members), determining that the comparison score exceeds a first threshold, and as a result of having determined that the comparison score exceeds the first threshold, selecting the existing data cluster as the matched data cluster (see Anderson para. 0104: when the comparison test yields a score above a threshold, the corresponding data record is deemed a member of the cluster).

As to claim 6, Anderson as modified by Gehrking and Bhattacharya teaches wherein selecting the matched data cluster from the plurality of existing data clusters includes determining that the selected data record satisfies a cluster-membership criterion for a particular data cluster from among the existing data clusters (see Anderson para. 0104: comparison test used to determine cluster membership) and, based on the selected data record having satisfied the cluster-membership criterion, selecting the particular data cluster to be the matched data cluster (see Anderson para. 0104: when the comparison test yields a score above a threshold, the corresponding data record is deemed a member of the cluster).

As to claim 7, Anderson as modified by Gehrking and Bhattacharya teaches wherein identifying one or more candidate data records from the received data records includes comparing the query to queries in a data store and determining that the query maps to a first cluster, wherein the data store maps queries to existing clusters (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Gehrking para. 0042 and Fig. 3: terms from a document are mapped to clusters using term-cluster index 115; Note: Gehrking’s terms correspond to the claimed queries).

As to claim 8, Anderson as modified by Gehrking and Bhattacharya teaches a data store maps queries to existing clusters (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Gehrking para. 0042 and Fig. 3: terms from a document are mapped to clusters using term-cluster index 115; Note: Gehrking’s terms correspond to the claimed queries); and
further including receiving a request to map a selected data record to a second cluster (see Bhattacharya paragraphs 0002, 0029-0032, and 0029: a system for semi-supervised clustering allows a user to perform assignment feedback, in which the user moves a data point from one cluster to another) and updating a data store to map a query to a second cluster instead of to a first cluster (see Bhattacharya paragraph 00055: the existing clusters C are updated based on the current assignment of data points to clusters).

As to claim 9, Anderson as modified by Gehrking and Bhattacharya teaches a data store maps queries to existing clusters (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Gehrking para. 0042 and Fig. 3: terms from a document are mapped to clusters using term-cluster index 115; Note: Gehrking’s terms correspond to the claimed queries); and
further including receiving a request to map the selected data record to a new cluster (see Bhattacharya claim 3: the assignment feedback comprises re-assigning a clustered object to a new data cluster) updating a data store with a new cluster indicator (see Bhattacharya paragraph 00055: the existing clusters C are updated based on the current assignment of data points to clusters), generating a new cluster, and assigning the selected data record to the new cluster (see Bhattacharya claim 3: the assignment feedback comprises re-assigning a clustered object to a new data cluster).

As to claim 11, Anderson as modified by Gehrking and Bhattacharya teaches a data store maps queries to existing clusters (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Gehrking para. 0042 and Fig. 3: terms from a document are mapped to clusters using term-cluster index 115; Note: Gehrking’s terms correspond to the claimed queries); and
further including receiving a first request, the first request being a request to exclude membership of a selected data record in a first cluster (see Bhattacharya paragraph 0019: a system for semi-supervised clustering allows data instances to be marked as “cannot-link,” which is a constraint that prevents the clustering system from associating a data instance with a particular cluster), in response to the first request, updating a data store to change membership of the selected data record (see Bhattacharya paragraphs 0002, 0029-0032, and 0029: assignment feedback allows the user to move a data point from one cluster to another; and see Bhattacharya paragraph 00055: the existing clusters C are updated based on the current assignment of data points to clusters), receiving a second request to update the data store, the second request being associated with another data record, updating the data store in response to the second request, wherein, as a result of information having been written into the data store when updating the data store in response to the first request, the selected data record continues to be excluded from membership in the first cluster (see Bhattacharya paragraph 0019: a system for semi-supervised clustering allows data instances to be marked as “cannot-link,” which is a constraint that prevents the clustering system from associating a data instance with a particular cluster).

As to claim 12, Anderson as modified by Gehrking and Bhattacharya teaches further including receiving input from a user to approve a proposed association of one of the received data records with a matched data cluster (see Gehrking para. 0103 and Fig. 19: clustering results are presented to a user to approve/disapprove each cluster membership).

As to claim 15, Anderson as modified by Gehrking and Bhattacharya teaches further including selecting the selected data record from an initial set of data records that have been sorted based on a distinguishability criterion that determines a degree to which a value included in a particular data record distinguishes that particular data record from other data records (see Anderson para. 0166: records are ordered based on inverse document frequency, which is a measure of how discriminating a particular keyword is; Note: Anderson’s inverse document frequency corresponds to the claimed “distinguishability criterion”).

As to claim 16, Anderson as modified by Gehrking and Bhattacharya teaches wherein processing the received data records includes sorting the data records based on a number of fields that are populated with a particular value, wherein the number of fields so populated is indicative of an extent to which the value distinguishes a data record from other data records (see Anderson para. 0166: records are ordered based on inverse document frequency, which is a measure of how discriminating a particular keyword is; Note: Anderson’s inverse document frequency corresponds to the claimed “distinguishability criterion”).

As to claim 17, Anderson as modified by Gehrking and Bhattacharya teaches wherein processing the received data records includes sorting the data records based on a number of tokens in one or more fields of the data records, wherein the number of tokens determines a degree to which a value included in a particular data record distinguishes that particular data record from other data records (see Anderson para. 0166: records are ordered based on inverse document frequency, which is a measure of how discriminating a particular keyword is; Note: Anderson’s inverse document frequency corresponds to the claimed “distinguishability criterion”), wherein each of the tokens includes either a value or a fragment of a value that is found in a either a field of the selected data record or a combination of fields of the selected data record (see Anderson para. 0298: single words are identified from a multi-word field; Note: Anderon’s single words correspond to the claimed “tokens”).

As to claim 18, Anderson teaches a computing system, the computing system including
an input device or port (see Anderson para. 0323: computer comprising input device or port) configured to receive data records, the received data records each including one or more values in one or more field (see Anderson para. 0075: dataset comprising records, each record having values for respective fields) and
at least one processor (see Anderson para. 0323: computer comprising a processor) configured to maintain an association of data records with respective data clusters, each data record belonging to a single data cluster of the respective data clusters (see Anderson para. 0037: the system forms clusters of data elements, each data element belonging to one cluster), the maintaining including processing the received data records to identify a matched data cluster to associate with each received data record (see Anderson para. 0104 and Fig. 2A: clustering data records), the processing including, for selected data records from the received data records,
generating a query from the one or more values included in the selected data record (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Anderson para. 0104: field values of a data record are used to score the data record and compare it to existing cluster members; Note: field values of a data record, as taught by Anderson, correspond the claimed “query”),
identifying one or more candidate data records from the received data records using the query (see Anderson para. 0104: field values of a data record are used to score the data record and compare it to cluster members; Note: Anderson’s cluster members correspond to candidate data records),
determining that the selected data record satisfies a cluster membership criterion for a plurality of existing data clusters (see Anderson para. 0104: comparison test used to determine cluster membership),
selecting the matched data cluster for the selected data record from the plurality of existing data clusters (see Anderson para. 0104: when the comparison test yields a score above a threshold, the corresponding data record is deemed a member of the cluster).
Anderson does not appear to explicitly disclose providing an indication that a selected data record satisfies a cluster membership criterion for a plurality of existing clusters and of a selected matched data cluster to a user.
However, Gehrking teaches:
providing an indication that a selected data record satisfies a cluster membership criterion for a plurality of existing clusters and of a selected matched data cluster to a user (see Gehrking para. 0103 and Fig. 19: a given document, e.g. a website, is clustered into candidate clusters, and the results are presented to a user to approve/disapprove each candidate cluster membership).
It is important to avoid mistakes or inaccuracies that may degrade data quality and result in erroneous clustering/classification (see Anderson para. 0004). Therefore, it would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson to include the teachings of Gehrking because it enables a user to manually approve/disapprove cluster memberships (see Gehrking para. 0103 and Fig. 19), allowing review by a human expert to enhance quality control.
Anderson as modified by Gehrking does not appear to explicitly disclose receiving, from a user, an instruction to modify an association of a particular received data record of received data records with its matched data cluster, the modification including changing an association of the particular received data record from a matched data cluster to another data cluster of a plurality of existing data clusters.
However, Bhattacharya teaches receiving, from a user, an instruction to modify an association of a particular received data record of received data records with its matched data cluster, the modification including changing an association of the particular received data record from a matched data cluster to another data cluster of a plurality of existing data clusters (see Bhattacharya para. 0023 and 0029: re-assignment of data instance from one existing cluster to another) that satisfy a cluster membership criterion (see Bhattacharya para. 0021: cluster membership conditions).
It would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson as modified by Gehrking to include the teachings of Bhattacharya because it enables the user to supervise and guide the clustering process, resulting in significantly improved clustering performance (see Bhattacharya para. 0001).

As to claim 19, Anderson teaches a computer program that is stored on a computer-readable storage medium, the computer program including instructions for causing a computing system (see Anderson para. 0324: software stored on a computer-readable storage medium)
to receive data records (see Anderson para. 0075: dataset comprising records, each record having values for respective fields) and
to maintain an association of data records with respective data clusters, each data record belonging to a single data cluster of the respective data clusters (see Anderson para. 0037: the system forms clusters of data elements, each data element belonging to one cluster), the maintaining including processing the received data records to identify a matched data cluster to associate with each received data record (see Anderson para. 0104 and Fig. 2A: clustering data records),
wherein the received data records each include one or more values in one or more fields (see Anderson para. 0075: dataset comprising records, each record having values for respective fields),
wherein the instructions for causing the computer system to process the received data records include instructions for causing the computer system, for selected data records from the received data records,
to generate a query from the one or more values included in the selected data record (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Anderson para. 0104: field values of a data record are used to score the data record and compare it to existing cluster members; Note: field values of a data record, as taught by Anderson, correspond the claimed “query”),
to identify one or more candidate data records from the received data records using the query (see Anderson para. 0104: field values of a data record are used to score the data record and compare it to cluster members; Note: Anderson’s cluster members correspond to candidate data records),
to determine that the selected data record satisfies a cluster membership criterion for a plurality of existing data clusters (see Anderson para. 0104: comparison test used to determine cluster membership),
to select the matched data cluster for the selected data record from the plurality of existing data clusters (see Anderson para. 0104: when the comparison test yields a score above a threshold, the corresponding data record is deemed a member of the cluster).
Anderson does not appear to explicitly disclose providing an indication that a selected data record satisfies a cluster membership criterion for a plurality of existing clusters and of a selected matched data cluster to a user.
However, Gehrking teaches:
providing an indication that a selected data record satisfies a cluster membership criterion for a plurality of existing clusters and of a selected matched data cluster to a user (see Gehrking para. 0103 and Fig. 19: a given document, e.g. a website, is clustered into candidate clusters, and the results are presented to a user to approve/disapprove each candidate cluster membership).
It is important to avoid mistakes or inaccuracies that may degrade data quality and result in erroneous clustering/classification (see Anderson para. 0004). Therefore, it would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson to include the teachings of Gehrking because it enables a user to manually approve/disapprove cluster memberships (see Gehrking para. 0103 and Fig. 19), allowing review by a human expert to enhance quality control.
Anderson as modified by Gehrking does not appear to explicitly disclose receiving, from a user, an instruction to modify an association of a particular received data record of received data records with its matched data cluster, the modification including changing an association of the particular received data record from a matched data cluster to another data cluster of a plurality of existing data clusters.
However, Bhattacharya teaches receiving, from a user, an instruction to modify an association of a particular received data record of received data records with its matched data cluster, the modification including changing an association of the particular received data record from a matched data cluster to another data cluster of a plurality of existing data clusters (see Bhattacharya para. 0023 and 0029: re-assignment of data instance from one existing cluster to another) that satisfy a cluster membership criterion (see Bhattacharya para. 0021: cluster membership conditions).
It is widely accepted that clustering performed without supervision often does not lead to meaningful insights for the user and that supervised clustering may improve clustering results (see Bhattacharya paragraphs 0001 and 0019). It would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson as modified by Gehrking to include the teachings of Bhattacharya because it enables the user to supervise and guide the clustering process, resulting in significantly improved clustering performance (see Bhattacharya para. 0001 and 0019).

As to claim 20, Anderson teaches a computing system, the computing system (see Anderson para. 0323: computer system) including
means for receiving data records, the received data records each including one or more values in one or more field (see Anderson para. 0075: dataset comprising records, each record having values for respective fields) and
means for processing the received data records to identify a matched data cluster to associate with each received data record (see Anderson para. 0104 and Fig. 2A: clustering data records), the processing including, for selected data records from the received data records,
generating a query from the one or more values included in the selected data record (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Anderson para. 0104: field values of a data record are used to score the data record and compare it to existing cluster members; Note: field values of a data record, as taught by Anderson, correspond the claimed “query”),
identifying one or more candidate data records from the received data records using the query (see Anderson para. 0104: field values of a data record are used to score the data record and compare it to cluster members; Note: Anderson’s cluster members correspond to candidate data records),
determining that the selected data record satisfies a cluster membership criterion for a plurality of existing data clusters (see Anderson para. 0104: comparison test used to determine cluster membership),
selecting the matched data cluster for the selected data record from the plurality of existing data clusters (see Anderson para. 0104: when the comparison test yields a score above a threshold, the corresponding data record is deemed a member of the cluster).
Anderson does not appear to explicitly disclose providing an indication that a selected data record satisfies a cluster membership criterion for a plurality of existing clusters and of a selected matched data cluster to a user.
However, Gehrking teaches:
providing an indication that a selected data record satisfies a cluster membership criterion for a plurality of existing clusters and of a selected matched data cluster to a user (see Gehrking para. 0103 and Fig. 19: a given document, e.g. a website, is clustered into candidate clusters, and the results are presented to a user to approve/disapprove each candidate cluster membership).
It is important to avoid mistakes or inaccuracies that may degrade data quality and result in erroneous clustering/classification (see Anderson para. 0004). Therefore, it would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson to include the teachings of Gehrking because it enables a user to manually approve/disapprove cluster memberships (see Gehrking para. 0103 and Fig. 19), allowing review by a human expert to enhance quality control.
Anderson as modified by Gehrking does not appear to explicitly disclose receiving, from a user, an instruction to modify an association of a particular received data record of received data records with its matched data cluster, the modification including changing an association of the particular received data record from a matched data cluster to another data cluster of a plurality of existing data clusters.
However, Bhattacharya teaches receiving, from a user, an instruction to modify an association of a particular received data record of received data records with its matched data cluster, the modification including changing an association of the particular received data record from a matched data cluster to another data cluster of a plurality of existing data clusters (see Bhattacharya para. 0023 and 0029: re-assignment of data instance from one existing cluster to another) that satisfy a cluster membership criterion (see Bhattacharya para. 0021: cluster membership conditions).
It is widely accepted that clustering performed without supervision often does not lead to meaningful insights for the user and that supervised clustering may improve clustering results (see Bhattacharya paragraphs 0001 and 0019). It would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson as modified by Gehrking to include the teachings of Bhattacharya because it enables the user to supervise and guide the clustering process, resulting in significantly improved clustering performance (see Bhattacharya para. 0001 and 0019).

Claim 10 is rejected under 35 U.S.C. 103(a) as being unpatentable over Anderson, Gehrking, and Bhattacharya as applied to claim 1 above, and further in view of Renders et al. (U.S. Patent Application Publication No. 20080249999 A1, hereinafter referred to as Renders).
As to claim 10, Anderson as modified by Gehrking and Bhattacharya teaches a data store maps queries to existing clusters (Note: the claimed “query” is interpreted in light of the instant specification, which states the following: “Generating the query includes identifying tokens that each include at least one value or fragment of a value in a field or a combination of fields of the selected data record.” see para. 0006 of the published specification;
see Gehrking para. 0042 and Fig. 3: terms from a document are mapped to clusters using term-cluster index 115; Note: Gehrking’s terms correspond to the claimed queries).
Anderson as modified by Gehrking and Bhattacharya does not appear to explicitly disclose further including receiving a request to confirm membership of a selected data record in a first cluster, storing information in a data store, updating the data store in response to a request that is associated with another data record, and maintaining membership of the selected data record in the first cluster notwithstanding the request that is associated with another data record.
However, Renders teaches further including receiving a request to confirm membership of a selected data record in a first cluster (see Renders paragraph 0049: a system for semi-supervised clustering allows a user to confirm that the checked documents are outliers and should be clustered together in their own group), storing information in a data store, updating the data store in response to a request that is associated with another data record, and maintaining membership of the selected data record in the first cluster notwithstanding the request that is associated with another data record (see Renders paragraph 0020: a system for semi-supervised clustering maintains model 24 to store the classification of the clustered documents; and see Renders paragraph 0049: the system will update the model to constrain the next clustering iteration to maintain the checked documents in their own cluster).
It is widely accepted that clustering performed without supervision is not very useful for the user and that supervised clustering may improve clustering results. Therefore, it would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson as modified by Gehrking and Bhattacharya to include the teachings of Renders because it would improve the clustering results and provide more meaningful insights for the user.

Claim 14 is rejected under 35 U.S.C. 103(a) as being unpatentable over Anderson, Gehrking, and Bhattacharya as applied to claim 1 above, and further in view of Inoue et al. (U.S. Patent Application Publication No. 20100281036 A1, hereinafter referred to as Inoue).
As to claim 14, Anderson as modified by Gehrking and Bhattacharya teaches further including selecting a particular data cluster to be the matched data cluster for the selected data record (see Anderson para. 0104: when the comparison test yields a score above a threshold, the corresponding data record is deemed a member of the cluster) and storing information identifying an existing data cluster (see Gehrking para. 0042 and Fig. 3: terms from a document are mapped to clusters using term-cluster index 115).
Anderson as modified by Gehrking and Bhattacharya does not appear to explicitly disclose storing information identifying an existing data cluster that was not selected as a matched data cluster for a selected data record.
However, Inoue teaches storing information identifying an existing data cluster that was not selected as a matched data cluster for a selected data record (see Inoue paragraph 0056: search-omitted information extraction unit 109 stores information about clusters that are not selected).
In a clustering system, a user may be dissatisfied with the cluster associated with a given data object, and hence the user might request that the object be re-clustered. In such a scenario, having stored information about candidate data clusters for the given data object would avoid the need to re-compute the candidate clusters. Therefore, it would have been obvious to one having ordinary skill in the art at the time of the invention to have modified Anderson as modified by Gehrking and Bhattacharya to include the teachings of Inoue because avoiding re-computing candidate clusters improves the efficiency of the system.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Contact Information                                                                                                                                                                                                     Any inquiry concerning this communication or earlier communications from the examiner should be directed to UMAR MIAN whose telephone number is (571) 270-3970.  The examiner can normally be reached on Monday to Friday, 10 am to 6:30 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571) 272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/UM/Examiner, Art Unit 2163                                                                                                                                                                                               


/ALEX GOFMAN/Primary Examiner, Art Unit 2163