DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 25-45 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-6 and 13-18 of U.S. Patent No. US 11,132,343 B1. Although the claims at issue are not identical, they are not patentably distinct from each other. See the table below for more detail.
App. No. 17/446,336
Patent No. US 11,132,343 B1
Claim 25: A computer program product, stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations implementing automatic entity resolution (ER) data cleaning, the operations comprising:

receiving a group of N references to a candidate ER error entity, wherein the references and the candidate ER entity are persistent data stored in a data repository, and wherein the references each include attribute data describing the candidate ER entity; 

identifying a set of related references within the group of N references, wherein each related reference is related to at least one other reference of the group of N references;

for each pair of references identified as a part of the set of the related references, calculating an ER score representing a likelihood that the pair of related references refers to the candidate ER error entity;









determining based on the calculation of each ER score, that an ER error has occurred in grouping the references;









selecting a set of the pairs of the related references for validation of their ER scores, 


















sending the selected set of the pairs to an oracle for validation of their ER scores;

receiving validated ER scores from the oracle, the validated ER scores indicative of a match or non-match and are associated with a value between 0 and 1;

adjusting at least one of the validated ER scores by calculating a paralyzed ER score including determining whether the validated ER score satisfies a match threshold; and

performing a recursive re-grouping process comprising:

re-grouping, using a grouping method, the set of related references based in part on their respectively associated ER scores forming additional new input data based on the re-grouping of the references, the respectively associated ER scores comprising the paralyzed ER score;

re-assigning the group of each of the N references based on the re-grouping; and

recalculating a pairwise matching of the additional new input data formed based on the re-grouping of the references.

Similarly, claims 32 and 39.

Claim 1: A computer program product, stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations implementing automatic entity resolution (ER) data cleaning, the operations comprising: 

receiving a group of N references to a candidate ER error entity, wherein the references and the candidate ER entity are persistent data stored in a data repository, and wherein the references each include attribute data describing the candidate ER entity; 

identifying a set of related references within the group of N references, wherein each related reference is related to at least one other reference of the group of N references; 

for each pair of references identified as a part of the set of the related references, calculating an ER score representing a likelihood that the pair of related references refers to the candidate ER error entity, wherein calculation of each ER score is implemented using a machine learning algorithm, wherein a binary classifier, derived using supervised machine learning, is trained to return a result label of “match” or “no match” as a decision of whether or not an input pair of entity references describes the same entity, and wherein the result label is returned with a value of an ER score representative of a certainty in the decision; 

determining based on the calculation of each ER score, whether an ER error has occurred in grouping the references;

in an instance in which a determination that no errors have occurred in the grouping of references, returning the grouping of references as a single grouping;

in an instance in which a determination is made that an ER error has occurred in the grouping of the references, 

selecting a set of the pairs of the related references for validation of their ER scores, wherein the set of pairs selected for validation is a subset of all the related references, the subset of all the related references being less than an entirety of the related references, wherein the ER score threshold and the number of pairs to select for validation are parameterized such that any ER score failing to meet a predefined threshold is a candidate for validation and each set of pairs having an ER score failing to meet the predefined threshold up to a predefined maximum amount of pairs is selected for validation, wherein the predefined maximum is a size of the set of pairs selected for validation, and wherein selection of the predefined maximum amount of pairs for validation is based on results from grid search experimentation such that particular ER score thresholds are associated with a particular maximum amount of pairs selected for validation; 

sending the selected set of the pairs to an oracle for validation of their ER scores;

receiving validated ER scores from the oracle, the validated ER scores indicative of a match or non-match and are associated with a value between 0 and 1; 

adjusting at least one of the validated ER scores by calculating a paralyzed ER score including determining whether the validated ER score satisfies a match threshold; and

performing a recursive re-grouping process comprising: 

re-grouping, using a grouping method, the set of related references based in part on their respectively associated ER scores forming additional new input data based on the re-grouping of the references, the respectively associated ER scores comprising the paralyzed ER score; 

re-assigning the group of each of the N references based on the re-grouping; and 

recalculating a pairwise matching of the additional new input data formed based on the re-grouping of the references.
Claim 26: The computer program product of claim 25, wherein the group of N references is represented as a graph, wherein each reference is represented as a graph node, and wherein each graph edge represents the relationship between a pair of nodes connected by the edge.

Similarly, claims 33 and 40.

Claim 2: The computer program product of claim 1, wherein the group of N references is represented as a graph, wherein each reference is represented as a graph node, and wherein each graph edge represents the relationship between a pair of nodes connected by the edge.

Claim 27:The computer program product of claim 26, wherein the ER score calculated for a pair of the related references is associated with the edge connecting the pair of related references.

Similarly, claims 34 and 41.

Claim 3: The computer program product of claim 2, wherein the ER score calculated for a pair of the related references is associated with the edge connecting the pair of related references.

Claim 28: The computer program product of claim 25, wherein receiving the group of N references is preceded by selecting the candidate ER error entity based on an ER error score that represents the likelihood that the entity is described by erroneous ER data.

Similarly, claims 35 and 42.

Claim 4: The computer program product of claim 1, wherein receiving the group of N references is preceded by selecting the candidate ER error entity based on an ER error score that represents the likelihood that the entity is described by erroneous ER data.

Claim 29: The computer program product of claim 28, wherein the ER error score includes one or more of a count of the number of unique references in the group of N references and a count of the number of duplicates of the candidate ER error entity that are identified within the persistent data stored in the data repository.

Similarly, claims 36 and 43.

Claim 5: The computer program product of claim 4, wherein the ER error score includes one or more of a count of the number of unique references in the group of N references and a count of the number of duplicates of the candidate ER error entity that are identified within the persistent data stored in the data repository.

Claim 30: The computer program product of claim 28, wherein the group of N references is represented as a graph, and wherein the ER error score may be derived based in part on an analysis of the graph edges.

Similarly, claims 37 and 44.

Claim 6: The computer program product of claim 4, wherein the group of N references is represented as a graph, and wherein the ER error score may be derived based in part on an analysis of the graph edges.

Claim 31:The computer program product of claim 25, wherein calculation of each ER score is implemented using a machine learning algorithm, wherein a binary classifier, derived using supervised machine learning, is trained to return a result label of "match" or "no match" as a decision of whether or not an input pair of entity references describes the same entity, and
wherein the result label is returned with a value of an ER score representative of a certainty in the decision.

Similarly, claims 38 and 45.

Claim 1: … 
wherein calculation of each ER score is implemented using a machine learning algorithm, wherein a binary classifier, derived using supervised machine learning, is trained to return a result label of “match” or “no match” as a decision of whether or not an input pair of entity references describes the same entity, and wherein the result label is returned with a value of an ER score representative of a certainty in the decision…


Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 25-30, 32-37 and 39-44 are rejected under 35 U.S.C. 103 as being unpatentable over LI, US 20150363476 A1 (hereinafter “Li”) in view of Misra et al., US 9245015 B2 (hereinafter “Misra”) in further view of Woytowitz et al., US 8370328 B2 (hereinafter “Woytowitz”).

Claim 25: Li teaches a computer program product, stored on a non-transitory computer readable medium, comprising instructions that when executed on one or more computers cause the one or more computers to perform operations implementing automatic entity resolution (ER) data cleaning, the operations comprising:
receiving a group of N references to a candidate ER error entity (Li, [0034] note Mentions of entities within a document are identified. Candidate entities associated with each mention are identified. Mention-entity pairs are defined for each mention and the associated candidate entities, [Fig. 9] note 916, [0110] note a graph is constructed such that each node in the graph corresponds to a mention-entity pair),
wherein the references and the candidate ER entity are persistent data stored in a data repository (Li, [0031] note matching the mentions to corresponding entities stored in a knowledge base, or knowledge repository), and
wherein the references each include attribute data describing the candidate ER entity (Li, [0031] note a knowledge base, or knowledge repository, which also stores attributes and other information related to each entity);
identifying a set of related references within the group of N references, wherein each related reference is related to at least one other reference of the group of N references  (Li, [Fig. 9] note 918, [0110] note edges are constructed in the graph between nodes that have relationships between each other. The relationships between nodes correspond to relationships between the candidate entities in the mention-entity pairs);
for each pair of references identified as a part of the set of the related references, calculating an ER score representing a likelihood that the pair of related references refers to the candidate ER error entity (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score, [0106] note the confidence score indicates a conditional prior probability that the mention in the mention-entity pair refers to the candidate entity in the mention-entity pair, [0110] note the node score for each node is updated as a function of the node score and the edge scores of incoming and outgoing edges for the node; i.e. calculating an ER score);
performing a re-grouping process comprising: re-grouping, using a grouping method, the set of related references based in part on their respectively associated ER scores (Li, [Fig. 9] note output entities are generated that correspond to the selected nodes… the output entities are grouped into clusters); and
re-assigning the group of each of the N references based on the re-grouping (Li, [0083] note after clustering, a unique cluster ID is assigned to each cluster, [0084] note entities with the same cluster ID (i.e., in the same cluster)).
Li does not explicitly teach determining based on the calculation of each ER score, that an ER error has occurred in grouping the references; selecting a set of the pairs of the related references for validation of their ER scores, sending the selected set of the pairs to an oracle for validation of their ER scores; receiving validated ER scores from the oracle, the validated ER scores indicative of a match or non-match and are associated with a value between 0 and 1; adjusting at least one of the validated ER scores by calculating a paralyzed ER score including determining whether the validated ER score satisfies a match threshold; and recursive; forming additional new input data based on the re-grouping of the references, the respectively associated ER scores comprising the paralyzed ER score; recalculating a pairwise matching of the additional new input data formed based on the re-grouping of the references.
However, Misra teaches determining based on the calculation of each ER score, that an ER error has occurred in grouping the references (Misra, [Col. 24 Lines 32-39] note an overall alias score of one may indicate that the terms, associated with the overall alias score, are alias terms. In some implementations, an overall alias score of zero may indicate that the terms, associated with the overall alias score, are not alias terms. In some implementations, an overall alias score that satisfies a threshold (e.g., greater than 0.5) may indicate that the terms, associated with the overall alias score, are alias terms; i.e. the examiner interprets an alias score below 0.5 reads on a likely ER error);
selecting a set of the pairs of the related references for validation of their ER scores, sending the selected set of the pairs to an oracle for validation of their ER scores (Misra, [Col. 28 Lines 22-37] note user interface 805 may provide a mechanism (e.g., a checkbox) for a user to indicate whether two terms are alias terms… boxes associated with terms having an alias score that satisfies a first threshold (e.g., greater than 0.5) may be checked by default, and boxes associated with terms having an alias score that satisfies a second threshold (e.g., greater than 0, but less than 0.5) may be unchecked by default, [Fig. 8A], [Fig. 8B], [Col. 30 Lines 9-20] note the user interfaces may be customizable by a device. Additionally, or alternatively, the user interfaces may be pre-configured to standard configuration);
receiving validated ER scores from the oracle, the validated ER scores indicative of a match or non-match and are associated with a value between 0 and 1 (Misra, [Col. 24 Lines 32-36] note an overall alias score of one may indicate that the terms, associated with the overall alias score, are alias terms… an overall alias score of zero may indicate that the terms, associated with the overall alias score, are not alias terms, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1); i.e. a score from 0 to 1);
adjusting at least one of the validated ER scores by calculating a paralyzed ER score including determining whether the validated ER score satisfies a match threshold (Misra, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1), based on a user indication that the terms are alias terms); and
the respectively associated ER scores comprising the paralyzed ER score (Misra, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1); i.e. incorporated into a calculation of the overall alias score = ER scores comprising the paralyzed ER score).
It would have been obvious to one of ordinary skill in the art at the time of the applications filing to combine the entity resolution of Li with the alias score based on user indication of Misra according to known methods (i.e. prompting a user to indicate a mention-entity pair). Motivation for doing so is that this may eliminate confusion resulting from the inconsistent use of terms in a document, by determining terms in the document that refer to the same entity (Misra, [Col. 2 Lines 37-40]).
Li and Misra do not explicitly teach recursive; forming additional new input data based on the re-grouping of the references; and recalculating a pairwise matching of the additional new input data formed based on the re-grouping of the references
However, Woytowitz teaches this (Woytowitz, [Col. 11 Lines 54-] note read all of the comparison results for a given mention group and an algorithm and recursively group together into new entity objects all of the pairs of mention objects that satisfy a specified confidence score threshold. Entity creation module 68 also compares all of the mention objects in pairs of previously-created entity objects and merges entity objects when the comparison between the preexisting entity objects yields a result that meets a confidence threshold; i.e. creating new entity objects = forming additional new input data, comparing all of the mention objects in pairs of previously-created entity objects = recalculating a pairwise matching, [Col. 16 Lines 21-28] note "recluster" mention objects and entity objects… a new Clustering is created and appended to the file each time a different hierarchical fuzzy logic comparison algorithm is run to "recluster" mention objects to and thereby revise the Entity Objects).
It would have been obvious to one of ordinary skill in the art at the time of the applications filing to combine the entity linking of Li and Misra with the recursive entity grouping of Woytowitz according to known methods (i.e. recursively group previously created entity objects into new entity objects). Motivation for doing so is that processing accuracy and throughput of the system are dramatically improved as compared to conventional disambiguation systems (Woytowitz, [Col. 10 Lines 58-60]).

Claim 26: Li, Misra and Woytowitz teach the computer program product of claim 25, wherein the group of N references is represented as a graph, wherein each reference is represented as a graph node, and wherein each graph edge represents the relationship between a pair of nodes connected by the edge (Li, [0035] note the graph consists of a set of nodes (also known as "vertices"), where some pairs of nodes are connected by directional edges (also known as "arcs"). Each node in the graph corresponds to a mention-entity pair).

Claim 27: Li, Misra and Woytowitz teach the computer program product of claim 26, wherein the ER score calculated for a pair of the related references is associated with the edge connecting the pair of related references (Li, [0110] note the node score for each node is updated as a function of the node score and the edge scores of incoming and outgoing edges for the node).

Claim 28: Li, Misra and Woytowitz teach the computer program product of claim 25, wherein receiving the group of N references is preceded by selecting the candidate ER error entity based on an ER error score that represents the likelihood that the entity is described by erroneous ER data (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score, [0106] note the confidence score indicates a conditional prior probability that the mention in the mention-entity pair refers to the candidate entity in the mention-entity pair; prior confidence score = ER error score).

Claim 429: Li, Misra and Woytowitz teach the computer program product of claim 28, wherein the ER error score includes one or more of a count of the number of unique references in the group of N references and a count of the number of duplicates of the candidate ER error entity that are identified within the persistent data stored in the data repository (Li, [0049] note the score of each candidate entity for a given mention-entity pair represents [0050] p(ei|mj), i.e., the prior probability of mention mj referring to entity ei given the condition that mj appears in the document, [0054] note the score may be determined as count(mj, ei)/count(mj)).

Claim 30: Li, Misra and Woytowitz teach the computer program product of claim 28, wherein the group of N references is represented as a graph (Li, [0035] note the graph consists of a set of nodes (also known as "vertices"), where some pairs of nodes are connected by directional edges (also known as "arcs"). Each node in the graph corresponds to a mention-entity pair), and 
wherein the ER error score may be derived based in part on an analysis of the graph edges (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score).

Claim 32: Li teaches a method comprising:
receiving a group of N references to a candidate ER error entity (Li, [0034] note Mentions of entities within a document are identified. Candidate entities associated with each mention are identified. Mention-entity pairs are defined for each mention and the associated candidate entities, [Fig. 9] note 916, [0110] note a graph is constructed such that each node in the graph corresponds to a mention-entity pair),
wherein the references and the candidate ER entity are persistent data stored in a data repository (Li, [0031] note matching the mentions to corresponding entities stored in a knowledge base, or knowledge repository), and
wherein the references each include attribute data describing the candidate ER entity (Li, [0031] note a knowledge base, or knowledge repository, which also stores attributes and other information related to each entity);
identifying a set of related references within the group of N references, wherein each related reference is related to at least one other reference of the group of N references  (Li, [Fig. 9] note 918, [0110] note edges are constructed in the graph between nodes that have relationships between each other. The relationships between nodes correspond to relationships between the candidate entities in the mention-entity pairs);
for each pair of references identified as a part of the set of the related references, calculating an ER score representing a likelihood that the pair of related references refers to the candidate ER error entity (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score, [0106] note the confidence score indicates a conditional prior probability that the mention in the mention-entity pair refers to the candidate entity in the mention-entity pair, [0110] note the node score for each node is updated as a function of the node score and the edge scores of incoming and outgoing edges for the node; i.e. calculating an ER score);
performing a re-grouping process comprising: re-grouping, using a grouping method, the set of related references based in part on their respectively associated ER scores (Li, [Fig. 9] note output entities are generated that correspond to the selected nodes… the output entities are grouped into clusters); and
re-assigning the group of each of the N references based on the re-grouping (Li, [0083] note after clustering, a unique cluster ID is assigned to each cluster, [0084] note entities with the same cluster ID (i.e., in the same cluster)).
Li does not explicitly teach determining based on the calculation of each ER score, that an ER error has occurred in grouping the references; selecting a set of the pairs of the related references for validation of their ER scores, sending the selected set of the pairs to an oracle for validation of their ER scores; receiving validated ER scores from the oracle, the validated ER scores indicative of a match or non-match and are associated with a value between 0 and 1; adjusting at least one of the validated ER scores by calculating a paralyzed ER score including determining whether the validated ER score satisfies a match threshold; and recursive; forming additional new input data based on the re-grouping of the references, the respectively associated ER scores comprising the paralyzed ER score; recalculating a pairwise matching of the additional new input data formed based on the re-grouping of the references.
However, Misra teaches determining based on the calculation of each ER score, that an ER error has occurred in grouping the references (Misra, [Col. 24 Lines 32-39] note an overall alias score of one may indicate that the terms, associated with the overall alias score, are alias terms. In some implementations, an overall alias score of zero may indicate that the terms, associated with the overall alias score, are not alias terms. In some implementations, an overall alias score that satisfies a threshold (e.g., greater than 0.5) may indicate that the terms, associated with the overall alias score, are alias terms; i.e. the examiner interprets an alias score below 0.5 reads on a likely ER error);
selecting a set of the pairs of the related references for validation of their ER scores, sending the selected set of the pairs to an oracle for validation of their ER scores (Misra, [Col. 28 Lines 22-37] note user interface 805 may provide a mechanism (e.g., a checkbox) for a user to indicate whether two terms are alias terms… boxes associated with terms having an alias score that satisfies a first threshold (e.g., greater than 0.5) may be checked by default, and boxes associated with terms having an alias score that satisfies a second threshold (e.g., greater than 0, but less than 0.5) may be unchecked by default, [Fig. 8A], [Fig. 8B], [Col. 30 Lines 9-20] note the user interfaces may be customizable by a device. Additionally, or alternatively, the user interfaces may be pre-configured to standard configuration);
receiving validated ER scores from the oracle, the validated ER scores indicative of a match or non-match and are associated with a value between 0 and 1 (Misra, [Col. 24 Lines 32-36] note an overall alias score of one may indicate that the terms, associated with the overall alias score, are alias terms… an overall alias score of zero may indicate that the terms, associated with the overall alias score, are not alias terms, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1); i.e. a score from 0 to 1);
adjusting at least one of the validated ER scores by calculating a paralyzed ER score including determining whether the validated ER score satisfies a match threshold (Misra, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1), based on a user indication that the terms are alias terms); and
the respectively associated ER scores comprising the paralyzed ER score (Misra, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1); i.e. incorporated into a calculation of the overall alias score = ER scores comprising the paralyzed ER score).
It would have been obvious to one of ordinary skill in the art at the time of the applications filing to combine the entity resolution of Li with the alias score based on user indication of Misra according to known methods (i.e. prompting a user to indicate a mention-entity pair). Motivation for doing so is that this may eliminate confusion resulting from the inconsistent use of terms in a document, by determining terms in the document that refer to the same entity (Misra, [Col. 2 Lines 37-40]).
Li and Misra do not explicitly teach recursive; forming additional new input data based on the re-grouping of the references; and recalculating a pairwise matching of the additional new input data formed based on the re-grouping of the references
However, Woytowitz teaches this (Woytowitz, [Col. 11 Lines 54-] note read all of the comparison results for a given mention group and an algorithm and recursively group together into new entity objects all of the pairs of mention objects that satisfy a specified confidence score threshold. Entity creation module 68 also compares all of the mention objects in pairs of previously-created entity objects and merges entity objects when the comparison between the preexisting entity objects yields a result that meets a confidence threshold; i.e. creating new entity objects = forming additional new input data, comparing all of the mention objects in pairs of previously-created entity objects = recalculating a pairwise matching, [Col. 16 Lines 21-28] note "recluster" mention objects and entity objects… a new Clustering is created and appended to the file each time a different hierarchical fuzzy logic comparison algorithm is run to "recluster" mention objects to and thereby revise the Entity Objects).
It would have been obvious to one of ordinary skill in the art at the time of the applications filing to combine the entity linking of Li and Misra with the recursive entity grouping of Woytowitz according to known methods (i.e. recursively group previously created entity objects into new entity objects). Motivation for doing so is that processing accuracy and throughput of the system are dramatically improved as compared to conventional disambiguation systems (Woytowitz, [Col. 10 Lines 58-60]).

Claim 33: Li, Misra and Woytowitz teach the method of claim 32, wherein the group of N references is represented as a graph, wherein each reference is represented as a graph node, and wherein each graph edge represents the relationship between a pair of nodes connected by the edge (Li, [0035] note the graph consists of a set of nodes (also known as "vertices"), where some pairs of nodes are connected by directional edges (also known as "arcs"). Each node in the graph corresponds to a mention-entity pair).

Claim 34: Li, Misra and Woytowitz teach the method of claim 33, wherein the ER score calculated for a pair of the related references is associated with the edge connecting the pair of related references (Li, [0110] note the node score for each node is updated as a function of the node score and the edge scores of incoming and outgoing edges for the node).

Claim 35: Li, Misra and Woytowitz teach the method of claim 32, wherein receiving the group of N references is preceded by selecting the candidate ER error entity based on an ER error score that represents the likelihood that the entity is described by erroneous ER data (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score, [0106] note the confidence score indicates a conditional prior probability that the mention in the mention-entity pair refers to the candidate entity in the mention-entity pair; prior confidence score = ER error score).

Claim 36: Li, Misra and Woytowitz teach the method of claim 35, wherein the ER error score includes one or more of a count of the number of unique references in the group of N references and a count of the number of duplicates of the candidate ER error entity that are identified within the persistent data stored in the data repository (Li, [0049] note the score of each candidate entity for a given mention-entity pair represents [0050] p(ei|mj), i.e., the prior probability of mention mj referring to entity ei given the condition that mj appears in the document, [0054] note the score may be determined as count(mj, ei)/count(mj)).

Claim 37: Li, Misra and Woytowitz teach the method of claim 35, wherein the group of N references is represented as a graph (Li, [0035] note the graph consists of a set of nodes (also known as "vertices"), where some pairs of nodes are connected by directional edges (also known as "arcs"). Each node in the graph corresponds to a mention-entity pair), and 
wherein the ER error score may be derived based in part on an analysis of the graph edges (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score).

Claim 39: Li teaches a system comprising one or more computers and one or more non-transitory storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations implementing automatic entity resolution (ER) data cleaning, the operations comprising:
receiving a group of N references to a candidate ER error entity (Li, [0034] note Mentions of entities within a document are identified. Candidate entities associated with each mention are identified. Mention-entity pairs are defined for each mention and the associated candidate entities, [Fig. 9] note 916, [0110] note a graph is constructed such that each node in the graph corresponds to a mention-entity pair),
wherein the references and the candidate ER entity are persistent data stored in a data repository (Li, [0031] note matching the mentions to corresponding entities stored in a knowledge base, or knowledge repository), and
wherein the references each include attribute data describing the candidate ER entity (Li, [0031] note a knowledge base, or knowledge repository, which also stores attributes and other information related to each entity);
identifying a set of related references within the group of N references, wherein each related reference is related to at least one other reference of the group of N references  (Li, [Fig. 9] note 918, [0110] note edges are constructed in the graph between nodes that have relationships between each other. The relationships between nodes correspond to relationships between the candidate entities in the mention-entity pairs);
for each pair of references identified as a part of the set of the related references, calculating an ER score representing a likelihood that the pair of related references refers to the candidate ER error entity (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score, [0106] note the confidence score indicates a conditional prior probability that the mention in the mention-entity pair refers to the candidate entity in the mention-entity pair, [0110] note the node score for each node is updated as a function of the node score and the edge scores of incoming and outgoing edges for the node; i.e. calculating an ER score);
performing a re-grouping process comprising: re-grouping, using a grouping method, the set of related references based in part on their respectively associated ER scores (Li, [Fig. 9] note output entities are generated that correspond to the selected nodes… the output entities are grouped into clusters); and
re-assigning the group of each of the N references based on the re-grouping (Li, [0083] note after clustering, a unique cluster ID is assigned to each cluster, [0084] note entities with the same cluster ID (i.e., in the same cluster)).
Li does not explicitly teach determining based on the calculation of each ER score, that an ER error has occurred in grouping the references; selecting a set of the pairs of the related references for validation of their ER scores, sending the selected set of the pairs to an oracle for validation of their ER scores; receiving validated ER scores from the oracle, the validated ER scores indicative of a match or non-match and are associated with a value between 0 and 1; adjusting at least one of the validated ER scores by calculating a paralyzed ER score including determining whether the validated ER score satisfies a match threshold; and recursive; forming additional new input data based on the re-grouping of the references, the respectively associated ER scores comprising the paralyzed ER score; recalculating a pairwise matching of the additional new input data formed based on the re-grouping of the references.
However, Misra teaches determining based on the calculation of each ER score, that an ER error has occurred in grouping the references (Misra, [Col. 24 Lines 32-39] note an overall alias score of one may indicate that the terms, associated with the overall alias score, are alias terms. In some implementations, an overall alias score of zero may indicate that the terms, associated with the overall alias score, are not alias terms. In some implementations, an overall alias score that satisfies a threshold (e.g., greater than 0.5) may indicate that the terms, associated with the overall alias score, are alias terms; i.e. the examiner interprets an alias score below 0.5 reads on a likely ER error);
selecting a set of the pairs of the related references for validation of their ER scores, sending the selected set of the pairs to an oracle for validation of their ER scores (Misra, [Col. 28 Lines 22-37] note user interface 805 may provide a mechanism (e.g., a checkbox) for a user to indicate whether two terms are alias terms… boxes associated with terms having an alias score that satisfies a first threshold (e.g., greater than 0.5) may be checked by default, and boxes associated with terms having an alias score that satisfies a second threshold (e.g., greater than 0, but less than 0.5) may be unchecked by default, [Fig. 8A], [Fig. 8B], [Col. 30 Lines 9-20] note the user interfaces may be customizable by a device. Additionally, or alternatively, the user interfaces may be pre-configured to standard configuration);
receiving validated ER scores from the oracle, the validated ER scores indicative of a match or non-match and are associated with a value between 0 and 1 (Misra, [Col. 24 Lines 32-36] note an overall alias score of one may indicate that the terms, associated with the overall alias score, are alias terms… an overall alias score of zero may indicate that the terms, associated with the overall alias score, are not alias terms, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1); i.e. a score from 0 to 1);
adjusting at least one of the validated ER scores by calculating a paralyzed ER score including determining whether the validated ER score satisfies a match threshold (Misra, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1), based on a user indication that the terms are alias terms); and
the respectively associated ER scores comprising the paralyzed ER score (Misra, [Col. 28 Lines 60-63] note user indications may be incorporated into a calculation of the overall alias score… the overall score for a pair of terms may be set to a particular value (e.g., 1); i.e. incorporated into a calculation of the overall alias score = ER scores comprising the paralyzed ER score).
It would have been obvious to one of ordinary skill in the art at the time of the applications filing to combine the entity resolution of Li with the alias score based on user indication of Misra according to known methods (i.e. prompting a user to indicate a mention-entity pair). Motivation for doing so is that this may eliminate confusion resulting from the inconsistent use of terms in a document, by determining terms in the document that refer to the same entity (Misra, [Col. 2 Lines 37-40]).
Li and Misra do not explicitly teach recursive; forming additional new input data based on the re-grouping of the references; and recalculating a pairwise matching of the additional new input data formed based on the re-grouping of the references
However, Woytowitz teaches this (Woytowitz, [Col. 11 Lines 54-] note read all of the comparison results for a given mention group and an algorithm and recursively group together into new entity objects all of the pairs of mention objects that satisfy a specified confidence score threshold. Entity creation module 68 also compares all of the mention objects in pairs of previously-created entity objects and merges entity objects when the comparison between the preexisting entity objects yields a result that meets a confidence threshold; i.e. creating new entity objects = forming additional new input data, comparing all of the mention objects in pairs of previously-created entity objects = recalculating a pairwise matching, [Col. 16 Lines 21-28] note "recluster" mention objects and entity objects… a new Clustering is created and appended to the file each time a different hierarchical fuzzy logic comparison algorithm is run to "recluster" mention objects to and thereby revise the Entity Objects).
It would have been obvious to one of ordinary skill in the art at the time of the applications filing to combine the entity linking of Li and Misra with the recursive entity grouping of Woytowitz according to known methods (i.e. recursively group previously created entity objects into new entity objects). Motivation for doing so is that processing accuracy and throughput of the system are dramatically improved as compared to conventional disambiguation systems (Woytowitz, [Col. 10 Lines 58-60]).

Claim 40: Li, Misra and Woytowitz teach the system of claim 39, wherein the group of N references is represented as a graph, wherein each reference is represented as a graph node, and wherein each graph edge represents the relationship between a pair of nodes connected by the edge (Li, [0035] note the graph consists of a set of nodes (also known as "vertices"), where some pairs of nodes are connected by directional edges (also known as "arcs"). Each node in the graph corresponds to a mention-entity pair).

Claim 41: Li, Misra and Woytowitz teach the system of claim 40, wherein the ER score calculated for a pair of the related references is associated with the edge connecting the pair of related references (Li, [0110] note the node score for each node is updated as a function of the node score and the edge scores of incoming and outgoing edges for the node).

Claim 42: Li, Misra and Woytowitz teach the system of claim 39, wherein receiving the group of N references is preceded by selecting the candidate ER error entity based on an ER error score that represents the likelihood that the entity is described by erroneous ER data (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score, [0106] note the confidence score indicates a conditional prior probability that the mention in the mention-entity pair refers to the candidate entity in the mention-entity pair; prior confidence score = ER error score).

Claim 43: Li, Misra and Woytowitz teach the system of claim 42, wherein the ER error score includes one or more of a count of the number of unique references in the group of N references and a count of the number of duplicates of the candidate ER error entity that are identified within the persistent data stored in the data repository (Li, [0049] note the score of each candidate entity for a given mention-entity pair represents [0050] p(ei|mj), i.e., the prior probability of mention mj referring to entity ei given the condition that mj appears in the document, [0054] note the score may be determined as count(mj, ei)/count(mj)).


Claim 44: Li, Misra and Woytowitz teach the system of claim 42, wherein the group of N references is represented as a graph (Li, [0035] note the graph consists of a set of nodes (also known as "vertices"), where some pairs of nodes are connected by directional edges (also known as "arcs"). Each node in the graph corresponds to a mention-entity pair), and 
wherein the ER error score may be derived based in part on an analysis of the graph edges (Li, [Fig. 9] note 920, [0035] note each node is assigned the prior confidence score of a corresponding mention-entity pair as a node score).

Claims 31, 38 and 45 are rejected under 35 U.S.C. 103 as being unpatentable over Li, Misra and Woytowitz in further view of Agarwal et al., US 20150254329 A1 (hereinafter “Agarwal”).

Claim 31: Li, Misra and Woytowitz do not explicitly teach the computer program product of claim 25, wherein calculation of each ER score is implemented using a machine learning algorithm, wherein a binary classifier, derived using supervised machine learning, is trained to return a result label of "match" or "no match" as a decision of whether or not an input pair of entity references describes the same entity, and wherein the result label is returned with a value of an ER score representative of a certainty in the decision.
However, Agarwal teaches this (Agarwal, [0010] note ER analysis includes a large number of records to be processed in order to resolve the entities involved, [0011] note a pair-wise comparison of the textual documents is performed in each block to identify a set of textual documents pertaining to an entity. In the pair-wise comparison, based on a match function, two textual documents are considered as matching. The match function may include but is not limited to predefined rules, and binary classifiers derived using machine learning, [0054] note The computation module 124 may consider two documents as "Matching" if the two documents may return a value, e.g., "True" under some match function. In one implementation, a match function may be a Boolean function defined over two documents that may return "True", when the two documents are determined to be belonging to the same entity. On the other hand, in case the two documents may return a value "False", the two documents are determined to be not-matching; i.e. a “match” or “no match”).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the entity resolution of Li, Misra and Woytowitz with the entity resolution match function including binary classifiers derived using machine learning of Agarwal according to known methods (i.e. identifying matching entities based on a binary classifier derived using machine learning). Motivation for doing so is that this provides an approach for a time-saving, accurate, and inexpensive ER analysis (Agarwal, [0031]).

Claim 38: Li, Misra and Woytowitz do not explicitly teach the method of claim 32, wherein calculation of each ER score is implemented using a machine learning algorithm, wherein a binary classifier, derived using supervised machine learning, is trained to return a result label of "match" or "no match" as a decision of whether or not an input pair of entity references describes the same entity, and wherein the result label is returned with a value of an ER score representative of a certainty in the decision.
However, Agarwal teaches this (Agarwal, [0010] note ER analysis includes a large number of records to be processed in order to resolve the entities involved, [0011] note a pair-wise comparison of the textual documents is performed in each block to identify a set of textual documents pertaining to an entity. In the pair-wise comparison, based on a match function, two textual documents are considered as matching. The match function may include but is not limited to predefined rules, and binary classifiers derived using machine learning, [0054] note The computation module 124 may consider two documents as "Matching" if the two documents may return a value, e.g., "True" under some match function. In one implementation, a match function may be a Boolean function defined over two documents that may return "True", when the two documents are determined to be belonging to the same entity. On the other hand, in case the two documents may return a value "False", the two documents are determined to be not-matching; i.e. a “match” or “no match”).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the entity resolution of Li, Misra and Woytowitz with the entity resolution match function including binary classifiers derived using machine learning of Agarwal according to known methods (i.e. identifying matching entities based on a binary classifier derived using machine learning). Motivation for doing so is that this provides an approach for a time-saving, accurate, and inexpensive ER analysis (Agarwal, [0031]).

Claim 45: Li, Misra and Woytowitz do not explicitly teach the system of claim 39, wherein calculation of each ER score is implemented using a machine learning algorithm, wherein a binary classifier, derived using supervised machine learning, is trained to return a result label of "match" or "no match" as a decision of whether or not an input pair of entity references describes the same entity, and wherein the result label is returned with a value of an ER score representative of a certainty in the decision.
However, Agarwal teaches this (Agarwal, [0010] note ER analysis includes a large number of records to be processed in order to resolve the entities involved, [0011] note a pair-wise comparison of the textual documents is performed in each block to identify a set of textual documents pertaining to an entity. In the pair-wise comparison, based on a match function, two textual documents are considered as matching. The match function may include but is not limited to predefined rules, and binary classifiers derived using machine learning, [0054] note The computation module 124 may consider two documents as "Matching" if the two documents may return a value, e.g., "True" under some match function. In one implementation, a match function may be a Boolean function defined over two documents that may return "True", when the two documents are determined to be belonging to the same entity. On the other hand, in case the two documents may return a value "False", the two documents are determined to be not-matching; i.e. a “match” or “no match”).
It would have been obvious to one of ordinary skill in the art at the effective filing date of the application to combine the entity resolution of Li, Misra and Woytowitz with the entity resolution match function including binary classifiers derived using machine learning of Agarwal according to known methods (i.e. identifying matching entities based on a binary classifier derived using machine learning). Motivation for doing so is that this provides an approach for a time-saving, accurate, and inexpensive ER analysis (Agarwal, [0031]).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Giuseppi Giuliani whose telephone number is (571)270-7128. The examiner can normally be reached Monday-Friday.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aleksandr Kerzhner can be reached on (571)270-1760. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/GIUSEPPI GIULIANI/Primary Examiner, Art Unit 2165