Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
Status of Claims
This action is in reply to the application filed November 9, 2020 and preliminary amendments filed on June 21, 2021.
Claims 1-20 are currently pending.
Claims 21-25 have been cancelled.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on May 24, 2022 has been considered by the examiner.
	
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 2, 4-6, and 8-20 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3-5, and 7-18 of U.S. Patent No. 10,831,725. Although the claims at issue are not identical, they are not patentably distinct from each other because the instant claims are fully encompassed by the claims in the ‘725 patent.
Instant Application 17/093,151
U.S. Patent 10,831,725
Claim 1:
An apparatus comprising:
     a processor configured to run one or more modules stored in memory, wherein the one or more modules are configured to:
     identify at least one pair of data records for which to determine a similarity value;
     determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records; and
     associate the at least one pair of data records with one or more clusters, each associated
with a unique entity, based on the similarity value for the at least one pair of data records.

Claim 2:
The apparatus of claim 1, wherein the one or more modules are further configured to identify one or more pairs of data records for which a similarity value need not be determined based on a predetermined set of attributes that are likely to be shared by related data records.

Claim 1:
An apparatus comprising: 
     a processor configured to acquire instructions stored in one or more memories and execute the instructions to:    
     receive a plurality of data records from a plurality of data sources, wherein each of the data records includes a plurality of attributes describing an entity; 
     identify a pair of data records from the received data records and determine whether the pair of data records is eligible or ineligible for a similarity value determination based on a predetermined set of attributes shared by the plurality of attributes of the pair of data records; 
     exclude the pair of data records from the similarity value determination when the step of determining eligibility determines that the pair of data records is ineligible; 
     process the pair of data records through the similarity value determination when the step of determining eligibility determines that the pair of data records is eligible, wherein the similarity value determination includes determining a similarity value for the pair of data records based on similarity between the plurality of attributes in one data record in the pair and the plurality of attributes in another data record in the pair; 
     provide one or more clusters, wherein each of the clusters is configured to store data records of an unique entity, and associate the pair of data records with one of the clusters based on the similarity value for the pair of data records, wherein the cluster to which the pair of data records is associated is configured to store data records of the entity described by the plurality of attributes; 
     compare, after the step of associating the pair of data records with the cluster, the plurality of attributes in the pair of data records with attributes of other data records in the cluster, and determine one or more attribute differences; 
     determine at least one importance score for one or more first attributes in the plurality of attributes based on the determined attribute differences, wherein the determined importance score is used by the similarity value determination as a factor in modifying weights of the first attributes for determining the similarity value; and 
     identify another pair of data records and process the other pair of data records through the similarity value determination; 
     wherein each of the other pair of data records includes a second plurality of attributes; and 
     wherein the similarity value determination includes determining that the second plurality of attributes includes the first attributes, and determining the similarity value for the other pair of data records using the modified weights.
4
3
5
4
6
5
8
7
9
8
10
9
11
10
12
11
13
12
Claim 14:
     A method for clustering a plurality of data records into at least one cluster, the method comprising:
     identifying, at a candidate reduction module in a computing device, at least one pair of the plurality of data records for which to determine a similarity value;
     determining, at a similarity computation module residing in the computing device, in communication with the candidate reduction module, the similarity value for the at least one pair based, at least in part, on a plurality of attributes associated with the at least one pair of data
records; and
     associating, at a clustering computation module residing in the computing device, in communication with the similarity computation module, the at least one pair of data records with one or more clusters, each associated with a unique entity, based on the similarity value for the at least one pair of data records.

Claim 15:
The method of claim 14, further comprising identifying, at the candidate reduction module, one or more pairs of the plurality of data records for which a similarity value need not be determined based on a predetermined set of attributes that are likely to be shared by related data records.



Claim 13:
     A method comprising: 
     receiving, by a processor, a plurality of data records from a plurality of data sources, wherein each of the data records includes a plurality of attributes describing an entity; 
     identifying, by the processor, a pair of data records from the received data records and determining, by the processor, whether the pair of data records is eligible or ineligible for a similarity value determination based on a predetermined set of attributes shared by the plurality of attributes of the pair of data records; 
     excluding, by the processor, the pair of data records from the similarity value determination when the step of determining eligibility determines that the pair of data records is ineligible; 
     processing, by the processor, the pair of data records through the similarity value determination when the step of determining eligibility determines that the pair of data records is eligible, wherein the similarity value determination includes determining a similarity value for the pair of data records based on similarity between the plurality of attributes in one data record in the pair and the plurality of attributes in another data record in the pair; 
     providing, by the processor, one or more clusters, wherein each of the clusters is configured to store data records of an unique entity, and associating, by the processor, the pair of data records with one of the clusters based on the similarity value for the pair of data records, wherein the cluster to which the pair of data records is associated is configured to store data records of the entity described by the plurality of attributes; 
     comparing, after the step of associating the pair of data records with the cluster, by the processor, the plurality of attributes in the pair of data records with attributes of other data records in the cluster, and determining, by the processor, one or more attribute differences; 
     determining, by the processor, at least one importance score for one or more first attributes in the plurality of attributes based on the determined attribute differences, wherein the determined importance score is used by the similarity value determination as a factor in modifying weights of the first attributes for determining the similarity value; and 
     identifying, by the processor, another pair of data records and processing, by the processor, the other pair of data records through the similarity value determination; 
     wherein each of the other pair of data records includes a second plurality of attributes; and 
     wherein the similarity value determination includes determining that the second plurality of attributes includes the first attributes, and determining the similarity value for the other pair of data records using the modified weights.
16
14
17
15
18
16
19
17
Claim 20:
    A computer program product, tangibly embodied in a non-transitory computer-readable storage medium, the computer program product including instructions operable to cause a data processing system to:
    identify at least one pair of data records for which to determine a similarity value;
     determine the similarity value for the at least one pair of data records based, at least in
part, on a plurality of attributes associated with the at least one pair of data records; and
     associate the at least one pair of data records with one or more clusters, each associated
with a unique entity, based on the similarity value for the at least one pair of data records.

Claim 18:
     A computer program product, tangibly embodied in a non-transitory computer-readable storage medium, the computer program product including instructions executable by a processor to: 
     receive a plurality of data records from a plurality of data sources, wherein each of the data records includes a plurality of attributes describing an entity; 
     identify a pair of data records from the received data records and determine whether the pair of data records is eligible or ineligible for a similarity value determination based on a predetermined set of attributes shared by the plurality of attributes of the pair of data records; 
     exclude the pair of data records from the similarity value determination when the step of determining eligibility determines that the pair of data records is ineligible; 
     process the pair of data records through the similarity value determination when the step of determining eligibility determines that the pair of data records is eligible, wherein the similarity value determination includes determining a similarity value for the pair of data records based on similarity between the plurality of attributes in one data record in the pair and the plurality of attributes in another data record in the pair; 
     provide one or more clusters, wherein each of the clusters is configured to store data records of an unique entity, and associate the pair of data records with one of the clusters based on the similarity value for the pair of data records, wherein the cluster to which the pair of data records is associated is configured to store data records of the entity described by the plurality of attributes; 
     compare, after the step of associating the pair of data records with the cluster, the plurality of attributes in the pair of data records with attributes of other data records in the cluster, and determine one or more attribute differences; 
     determine at least one importance score for one or more first attributes in the plurality of attributes based on the determined attribute differences, wherein the determined importance score is used by the similarity value determination as a factor in modifying weights of the first attributes for determining the similarity value; and
     identify another pair of data records and process the other pair of data records through the similarity value determination; wherein each of the other pair of data records includes a second plurality of attributes; and 
     wherein the similarity value determination includes determining that the second plurality of attributes includes the first attributes, and determining the similarity value for the other pair of data records using the modified weights.



Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 3, 15, and 16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.  The term “likely to be shared” in claims 2 and 15 is a relative term which renders the claim indefinite. The term “likely” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention.  Claims 3 and 16 depend form claims 2 and 15, respectively, and inherit the deficiencies of claims 2 and 15.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101.  When considering subject matter eligibility under 35 U.S.C. § 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1).  If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A).  The step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g. mathematical concepts, mental processes, certain methods of organizing human activity).  If it is determined in step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application.  If it is determined that step 2A, Prong that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible application of the exception (Step 2B).  If an abstract idea is present in the claim, any element or combination of elements in the claim must be sufficient to ensure that the claim integrates the judicial exception into a practical application, or else amounts to significantly more than the abstract idea itself. 
According to the first part of the analysis, in the instant case claims 1-13 are directed to an apparatus, claims 14-19 are directed to a method, and claim 20 is directed to a computer program product.  Thus, each of the claims falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).  
Following the determination of whether or not the claims fall within one of the four categories (Step 1), it must be determined if the claims recite a judicial exception (e.g. mathematical concepts, mental processes, certain methods of organizing human activity) (Step 2A, Prong 1).  In this case, the claims are determined to recite a judicial exception as explained below.

Regarding independent claim 1, the limitations detailing “identify at least one pair of data records for which to determine a similarity value; determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records; and associate the at least one pair of data records with one or more clusters, each associated with a unique entity, based on the similarity value for the at least one pair of data records” are, as drafted, processes that, under broadest reasonable interpretation, covers performance of the limitations in the mind.  (The computer components and limitation detailing receiving sensor data are additional elements and addressed later in the analysis.)
For example, formulating the identification, determination of a similarity value, attributes, association, and clustering are not described in detail, and, under broadest reasonable interpretation, may be steps completed by a person mentally or with pen and paper.  A human is capable of identifying data records for which to determine a similarity value, determine a similarity value between a pair of records, and associate the pair of records with a cluster.  Therefore, under broadest reasonable interpretation, these steps could be performed by a human mentally or with pen and paper.

Regarding dependent claim 2, the identification of data records for which a similarity value need not be determined can be performed by a human mentally.  The claim does not detail how the determinations are performed, and, under broadest reasonable interpretation, these steps can be performed mentally by a human.
Regarding claim 3, the adjustment of predetermined attributes is not detailed and can be performed by a human mentally with pen and paper.
Regarding claim 4, the determining a similarity based on a similarity function learned from training data records is not detailed and are steps a human may perform mentally. 
Regarding claim 5, the steps regarding inference of an importance, determining differences between components, and determining an importance based on a number of appearances are not detailed and are steps a human may perform mentally.
Regarding claim 6, the steps regarding inference of a likelihood of interchange, determining difference between components, and determining a likelihood of interchanging are not detailed and are steps a human may perform mentally.
 Regarding claim 7, the steps regarding determine a conditional likelihood that a missing attribute had a particular component, determining a combination of known attributes corresponding to a particular entity, determining all variations of a missing attribute amongst data records, and determining a conditional probability that the missing attribute has a particular component are not detailed and are steps a human may perform mentally.
Regarding claim 8, the steps regarding the representation of data records as nodes in a graph, representing the similarity by edges between nodes, and determining clusters from the graph are not detailed and are steps a human may perform mentally, or with pen and paper.
Regarding claim 9, the determination of clusters based on a graph clustering technique is not detailed and are steps a human may perform mentally.
Regarding claim 10, the association of two data records with the same cluster based on a directive is not detailed and are steps a human may perform mentally.
Regarding claim 11, the steps detailing associating data records with one or more clusters using a clustering technique and adjusting a parameter for each of the one or more clusters independently are not detailed and are steps a human may perform mentally.
Regarding claim 12, the determination of a similarity value by receiving the similarity value is a step that a human may perform mentally.
Regarding claim 13, the steps detailing receiving a plurality of sub-clusters and performing a union-find operation are not detailed and are steps a human may perform mentally.

Independent claims 14 and 20 are similar in scope to claim 1 and rejected for the same reasons as above.  Additionally, dependent claims 15-19 correspond to dependent claims 2-6.  The dependent claims are rejected under the same rationale as above.

Following the determination that the claims recite a judicial exception, it must be determined if the claims recite additional elements that integrate the exception into a practical application of the exception (Step 2A, Prong 2).  In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not include additional elements that integrate the exception into a practical application of the exception as explained below.

Regarding independent claim 1, the additional elements include an apparatus, modules, and memory.  The apparatus, modules, and memory are recited at a high-level of generality such that they amount to no more than generic computer components.  Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.  Dependent claims 2-11 and 10 do not recite any new additional elements and do not integrate the judicial exception into a practical application.  Claim 12 discloses additional elements including “another computing device.”  The “another computing device” is not detailed and amounts to generic computing components.  Similarly, claim 13 recites a plurality of computing devices, but provides no details and the devices amount to generic computing components.

Independent claims 14 and 20 are similar in scope to claim 1 and rejected for the same reasons as above.  Further, claim 14 recites a computing device, but  the device is recited at a high-level of generality such that they amount to no more than generic computer components.  Claim 20 recites a computer program product and non-transitory computer readable storage medium but these elements are recited at a high-level of generality such that they amount to no more than generic computer components.  Dependent claims 15-19 do not recite any new additional elements.

Based on the determination in Step 2A of the analysis that the claims are directed to a judicial exception, it must be determined if the claims contain any element or combination of elements sufficient to ensure that the claim amounts to significantly more than the judicial exception (Step 2B). In this case, after considering all claim elements individually and as an ordered combination, it is determined that the claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception as explained below.

Regarding independent claim 1, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than generic computer components.	
Dependent claims 2-13 do not recite any additional elements and do not provide limitations that amount to significantly more than the judicial exception.  The additional elements in claims 12 and 13 do not amount to significantly more because they also are generic computing components.
Independent claims 14 and 20 are similar in scope to claim 1 and rejected for the same reasons as above.  Further, claims 14 and dependent claims 15-19 do not contain any additional elements.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-6, 8-11, and 14-20  is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Burdick et al., U.S. Patent Application Publication 2004/0181526 (Burdick).

With respect to independent claim 1 Burdick teaches:
An apparatus comprising:
a processor configured to run one or more modules stored in memory (Burdick teaches implementing the disclosed steps on a computer program product and various processing steps; see at least [0013], [0056], [0068], and claim 14.  Burdick also teaches implementing the disclosed steps on a computer program product, which inherently comprises a memory; see [0013].), wherein the one or more modules are configured to:
identify at least one pair of data records for which to determine a similarity value (Burdick teaches determining a numerical record similarity score for a pair of records; see [0026].);
determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records (Burdick teaches records comprising data fields that are used to determine similarity; see [0011]-[0012] and [0027].); and
associate the at least one pair of data records with one or more clusters, each associated with a unique entity (Burdick teaches forming clusters of record pairs; see [0028].  Burdick also teaches identifying sets of database records associated with a real-world entity; see [0023].), based on the similarity value for the at least one pair of data records (Burdick teaches identifying sets of database records associated with a real-world entity by learning a record similarity measurement; see [0023].  The sets of records form a cluster that may represent a person; see figure 2 and [0028].).

With respect to independent claim 14 Burdick teaches:
A method for clustering a plurality of data records into at least one cluster, the method comprising:
identifying, at a candidate reduction module in a computing device (Burdick teaches implementing the disclosed steps on a computer program product and various processing steps; see at least [0013], [0056], [0068], and claim 14.), at least one pair of the plurality of data records for which to determine a similarity value (Burdick teaches determining a numerical record similarity score for a pair of records; see [0026].);
determining, at a similarity computation module residing in the computing device, in communication with the candidate reduction module, the similarity value for the at least one pair based, at least in part, on a plurality of attributes associated with the at least one pair of data records (Burdick teaches records comprising data fields that are used to determine similarity; see [0011]-[0012] and [0027].); and
associating, at a clustering computation module residing in the computing device, in communication with the similarity computation module, the at least one pair of data records with one or more clusters, each associated with a unique entity(Burdick teaches forming clusters of record pairs; see [0028].), based on the similarity value for the at least one pair of data records (Burdick teaches identifying sets of database records associated with a real-world entity; see [0023].  The sets of records form a cluster that may represent a person; see figure 2 and [0028].).

With respect to independent claim 20 Burdick teaches:
A computer program product (Burdick teaches implementing the disclosed steps on a computer program product; see [0013] and claim 14.), tangibly embodied in a non-transitory computer- readable storage medium, the computer program product including instructions operable to cause a data processing system to:
identify at least one pair of data records for which to determine a similarity value (Burdick teaches determining a numerical record similarity score for a pair of records; see [0026].);
determine the similarity value for the at least one pair of data records based, at least in part, on a plurality of attributes associated with the at least one pair of data records (Burdick teaches records comprising data fields that are used to determine similarity; see [0011]-[0012] and [0027].); and
associate the at least one pair of data records with one or more clusters (Burdick teaches forming clusters of record pairs; see [0028].), each associated with a unique entity, based on the similarity value for the at least one pair of data records (Burdick teaches identifying sets of database records associated with a real-world entity; see [0023].  The sets of records form a cluster that may represent a person; see figure 2 and [0028].).

With respect to dependent claim 2 the rejection of claim 1 is incorporated.  Further Burdick teaches:
the one or more modules are further configured to identify one or more pairs of data records for which a similarity value need not be determined based on a predetermined set of attributes that are likely to be shared by related data records (Burdick teaches determining a similarity score, and, alternatively, receiving a predetermined similarity score for records; see [0011].).

With respect to dependent claim 3 the rejection of claim 2 is incorporated.  Further Burdick teaches:
the one or more modules are configured to adjust the predetermined set of attributes based on association of data records to clusters from a previous iteration (Burdick teaches updating similarity measurements through multiple iterations; see [0054].).

With respect to dependent claim 4 the rejection of claim 1 is incorporated.  Further Burdick teaches:
the one or more modules are configured to determine the similarity value based on a similarity function learned from training data records (Burdick teaches learning and updating a similarity function that is then used to update similarity scores; see [0027], [0030], and [0054].).

With respect to dependent claim 5 the rejection of claim 4 is incorporated.  Further Burdick teaches:
the similarity function is designed to infer an importance of a particular component associated with a particular attribute of a data record (Burdick teaches learning and updating a similarity function that is then used to update similarity scores; see [0027], [0030], and [0054].), wherein the similarity function is learned by:
determining differences between components associated with the particular attribute of the training data records , wherein the training data records are known belong to the same cluster (Burdick teaches determining differences in records; see figures 1 & 2, [0004]-[0005]. [0025], [0029].); and
determining the importance of the particular component based on a number of times the particular component appears in the differences (Burdick teaches determining similar records based on a shared number of similarity values and whether there is a high or low number of similarity values; see [0029].).

With respect to dependent claim 6 the rejection of claim 4 is incorporated.  Further Burdick teaches:
the similarity function is designed to infer a likelihood of interchanging a first component in a particular attribute of a data record with a second component (Burdick teaches an iterative process comprising a method to select the best attribute among multiple attributes (interchange); see [0008].  The system may then present this information to a user who makes the selection; see [0049].), wherein the similarity function is learned by:
determining differences between components associated with the particular attribute of the training data records, wherein the training data records are known belong to the same cluster (Burdick teaches determining differences in records; see figures 1 & 2, [0004]-[0005]. [0025], [0029].); and
determining the likelihood of interchanging the first component with the second component based on a number of times the first component and the second component appears in the differences at the same time (Burdick teaches determining similar records based on a shared number of similarity values and whether there is a high or low number of similarity values; see [0029].). 

With respect to dependent claim 8 the rejection of claim 1 is incorporated.  Further Burdick teaches:
the one or more modules are configured to:
represent the plurality of data records as a plurality of nodes in a graph (Burdick teaches representing fields as nodes; see figures 4 and 5 as well as [0038]-[0039].);
represent the similarity value for the at least one pair of data records as at least one edge between nodes, in the graph, corresponding to the at least one pair of data records (See figures 4 & 5 and [0025] of Burdick, which teach decision trees with nodes and edges.); and
determine the one or more clusters from based on the graph (Burdick teaches using a decision tree to determine similarity scores; see [0011]-[0012].).

With respect to dependent claim 9 the rejection of claim 8 is incorporated.  Further Burdick teaches:
the one or more modules are configured to determine the one or more clusters based on the graph using a graph clustering technique (Burdick teaches clustering data using decision trees; see figures 7A & 7B as well as [0056].).

With respect to dependent claim 10 the rejection of claim 1 is incorporated.  Further Burdick teaches:
the one or more modules are configured to receive a clustering directive requiring the one or more modules to associate two data records with the same cluster (Burdick teaches clustering data using decision trees; see figures 7A & 7B as well as [0056].).

With respect to dependent claim 11 the rejection of claim 1 is incorporated.  Further Burdick teaches:
the one or more modules are configured to associate at least one of the plurality of data records to one or more clusters using a clustering technique (Burdick teaches clustering data using decision trees; see figures 7A & 7B as well as [0056].); and
adjust a parameter for the clustering technique for each of the one or more clusters independently, based on data records in the one or more clusters (Burdick teaches a user may provide assistances and classify the data when ambiguous or conflicting cases and re-executing the procedure; see [0054].).

With respect to dependent claim 15 the rejection of claim 14 is incorporated.  Further Burdick teaches:
identifying, at the candidate reduction module, one or more pairs of the plurality of data records for which a similarity value need not be determined based on a predetermined set of attributes that are likely to be shared by related data records (Burdick teaches determining a similarity score, and, alternatively, receiving a predetermined similarity score for records; see [0011].).

With respect to dependent claim 16 the rejection of claim 15 is incorporated.  Further Burdick teaches:
adjusting, at the candidate reduction module, the predetermined set of attributes based on association of data records to clusters from a previous iteration (Burdick teaches updating similarity measurements through multiple iterations; see [0054].).

With respect to dependent claim 17 the rejection of claim 14 is incorporated.  Further Burdick teaches:
determining, at the similarity computation module, the similarity value based on a similarity function learned from training data records (Burdick teaches learning and updating a similarity function that is then used to update similarity scores; see [0027], [0030], and [0054].).

With respect to dependent claim 18 the rejection of claim 17 is incorporated.  Further Burdick teaches:
the similarity function is designed to infer an importance of a particular component associated with a particular attribute of a data record (Burdick teaches learning and updating a similarity function that is then used to update similarity scores; see [0027], [0030], and [0054].).

With respect to dependent claim 19 the rejection of claim 17 is incorporated.  Further Burdick teaches:
the similarity function is designed to infer a likelihood of interchanging a first component in a particular attribute of a data record with a second component (Burdick teaches an iterative process comprising a method to select the best attribute among multiple attributes (interchange); see [0008].  The system may then present this information to a user who makes the selection; see [0049].).


Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over Burdick et al., U.S. Patent Application Publication 2004/0181526 (Burdick) in view of Breese et al., “Empirical Analysis of Predictive Algorithms for Collaborative Filtering” (Breese).

With respect to claim 7, the rejection of claim 4 is incorporated and further Burdick does not explicitly teach:
the similarity function is to determine a conditional likelihood that a missing attribute of a data record has a particular component, wherein the conditional likelihood is to be determined by:
determining a combination of known attributes corresponding to a particular entity; 
determining all variations of a missing attribute amongst data records of the particular entity having the combination of known attributes; and
determining a conditional probability, based on the variations of the missing attribute, that the missing attribute has a particular component given that the data record has the particular combination of known attributes.

However, Breese teaches these limitations:
the similarity function is to determine a conditional likelihood that a missing attribute of a data record has a particular component, wherein the conditional likelihood is to be determined by:
determining a combination of known attributes corresponding to a particular entity (Breese teaches collaborative filtering that predicts the utility of items to a particular user (entity) based on a database of user votes from samples; see the first paragraph of section 2.); 
determining all variations of a missing attribute amongst data records of the particular entity having the combination of known attributes (Breese teaches a decision tree that determines all variations regarding whether a user watched various television programs; see figure 1 and section 2.3.2.); and
determining a conditional probability, based on the variations of the missing attribute, that the missing attribute has a particular component given that the data record has the particular combination of known attributes (Breese teaches representing each conditional probability by a decision tree; see figure 1 and 2.3.2.).
Burdick and Breese are analogous art directed to record analysis.  Burdick teaches a method for determining similarity between records and clustering those records, and Breese teaches collaborative filtering methods that implement a conditional probability and clustering.
It would have been obvious for one of ordinary skill in the art of training neural networks to incorporate Breese’s clustering methods into Burdick’s disclosed system at before the effective filing date of the claimed invention to a person having ordinary skill in the art.  It would have been obvious because one of ordinary skill would be motivated to benefit from the performance of the Bayesian networks with decision trees; see section 4.1 and 2.3.2, cited above, which discusses the details of the Bayesian network implementation.

Claims 12 and 13 are rejected under 35 U.S.C. 103(a) as being unpatentable over Burdick et al., U.S. Patent Application Publication 2004/0181526 (Burdick) in view of Wang et al., “Achieving Usable and Privacy-assured Similarity Search over Outsourced Cloud Data" (Wang).

With respect to claim 12, the rejection of claim 1 is incorporated and further Burdick does not explicitly disclose:
the one or more modules are configured to determine the similarity value for the at least one pair of data records by receiving the similarity value for the at least one pair of data records from another computing device.

However, Wang teaches this limitation:
Wang teaches performing similarity searches and determining string similarity in a cloud environment comprising multiple computing devices; see section II.
Burdick and Wang are analogous art directed to record analysis and similarity determination.  Burdick teaches a method for determining similarity between records and clustering those records, and Wang teaches performing a similarity analysis in a cloud environment.
It would have been obvious for one of ordinary skill in the art of training neural networks to incorporate Wang's cloud implementation into Burdick’s disclosed system at before the effective filing date of the claimed invention to a person having ordinary skill in the art.  It would have been obvious because one of ordinary skill would be motivated to benefit from constant search time complexity; see abstract.

With respect to claim 13, the rejection of claim 1 is incorporated and further Burdick teaches:
the one or more modules are configured to: 
perform a union-find operation on the plurality of sub-clusters to identify the one or more clusters (Burdick teaches analyzing records to determine similarities and clusters; see figures 1 and 2, [0004]-[0005]. [0025], [0029].).

the processor is configured to:
perform a union-find operation on the plurality of sub-clusters to identify the cluster (Burdick teaches analyzing records to determine similarities and clusters; see figures 1 and 2, [0004]-[0005]. [0025], [0029].).

Burdick does not explicitly disclose:
receive, from a plurality of computing devices, a plurality of sub-clusters independently identified at the plurality of computing devices; and

However, Wang teaches this limitation:
Wang teaches a cloud based platform serving users on demand; see abstract.  The user devices comprise a plurality of computer devices and input from the devices are processed; see section I.
See the rejection of claim 12 for the motivation to combine.

Conclusion
Claims 1-20 are rejected.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL T PELLETT whose telephone number is (571)270-7156.  The examiner can normally be reached on Monday - Friday 9-5 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DANIEL T PELLETT/Primary Examiner, Art Unit 2121