DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Examiner’s Amendment
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given by Christine E. Orich (Reg. 44987) per email communication on 05/18/2022 following a telephone interview on 05/16/2022.
The application has been amended as follows:

Claim Amendments:

1.	(Currently Amended) A method, comprising:
selecting training data for a first machine learning model based on confidence scores representing likelihoods that pairs of entities in an online system are duplicates;
updating, by one or more computer systems, parameters of the first machine learning model based on features and labels in the training data;
identifying, by the one or more computer systems, a first subset of additional pairs of the entities as duplicate entities based on scores generated by the first machine learning model from values of the features for the additional pairs and a first threshold associated with the scores; and
updating, by the one or more computer systems, content outputted in a user interface of the online system based on the identified first subset of the additional pairs by, for each pair in the first subset of the additional pairs of the entities, determining a canonical entity based on additional features for the pair; and omitting output of a remaining entity that is not the canonical entity in the pair in the user interface of the online system.


2.	(Canceled) 


3.	(Currently Amended) The method of claim 1 
a completeness of data for each entity in the pair; 
a number of members of the online system associated with each entity in the pair;
a level of interaction between the members of the online system and each entity in the pair; and
a user annotation of a first entity in the pair as the canonical entity.


4.	(Original) The method of claim 1, wherein selecting the training data for the first machine learning model based on the confidence scores representing the likelihoods that the pairs of the entities are duplicates comprises:
generating the pairs of the entities based on similarities in attributes of the entities; and
sampling the training data to include subsets of the pairs of the entities associated with different ranges of the confidence scores.


5.	(Original) The method of claim 4, wherein generating the pairs of the entities based on the similarities in the attributes of the entities comprises:
obtaining a pair of entities from a mapping in an inverted index of an attribute; and
when a similarity between values of the attribute associated with the pair meets a threshold, adding the pair to the pairs of the entities.


6.	(Original) The method of claim 4, wherein the attributes comprise at least one of:
a name;
an address;
a phone number; and
a Uniform Resource Locator (URL).


7.	(Original) The method of claim 1, wherein selecting the training data for the first machine learning model based on the confidence scores representing the likelihoods that the pairs of the entities are duplicates comprises:
applying a second machine learning model to additional features for the pairs of the entities to generate the confidence scores.


8.	(Original) The method of claim 7, wherein: 
the additional features comprise a similarity between one or more attributes of a first entity in a pair of entities and one or more corresponding attributes of a second entity in the pair of entities, and
the one or more attributes and the one or more corresponding attributes comprise at least one of:
a name;
a location;
an industry;
a phone number; 
a Uniform Resource Locator (URL); and
a logo.


9.	(Original) The method of claim 7, wherein the additional features comprise at least one of:
a network distance between a first creator of a first entity in a pair of entities and a second creator of a second entity in the pair of entities; and
a comparison of a first size of the first entity and a second size of the second entity.


10.	(Original) The method of claim 1, wherein selecting the training data for the first machine learning model based on the confidence scores representing the likelihoods that the pairs of the entities are duplicates comprises:
determining the labels based on user annotations of the labels and assessments of accuracy of the user annotations.


11.	(Original) The method of claim 1, further comprising:
identifying a second subset of the additional pairs of entities as additional duplicate entities based on the scores generated by the first machine learning model from the values of the features for the additional pairs of entities and a second threshold associated with the scores; and
for each pair in the second subset of the additional pairs of the entities, selecting a canonical entity based on a user annotation of the canonical entity for the pair.


12.	(Original) The method of claim 1, wherein the features comprise:
a density of connections between a first set of members associated with a first entity and a second set of members associated with a second entity; and
a similarity between one or more attributes of the first entity and one or more corresponding attributes of the second entity.


13.	(Original) The method of claim 1, wherein the entities comprise companies in the online system.


14.	(Currently Amended) A system, comprising:
one or more processors; and
memory storing instructions that, when executed by the one or more processors, cause the system to:
obtain a first machine learning model that predicts a likelihood that a pair of entities in an online system are duplicates based on features that comprise a density of connections between a first set of members associated with a first entity in the pair and a second set of members associated with a second entity in the pair;
identify a first subset of pairs of the entities as duplicate entities based on scores generated by the first machine learning model from values of the features for the pairs of the entities and a first threshold associated with the scores; 
for each pair in the first subset of the pairs of the entities, determine a canonical entity based on additional features for the pair; 
update content outputted in a user interface of the online system based on the canonical entity;
sample training data for the first machine learning model based on confidence scores from a second machine learning model, wherein the confidence scores represent likelihoods that pairs of entities in the online system are duplicates; 
determine labels in the training data based on user annotations of the labels and assessments of accuracy of the user annotations; and
update parameters of the first machine learning model based on the labels and the features in the training data.


15.	(Canceled) 


16.	(Currently Amended) The system of claim 14 
generating the pairs of the entities based on inverted indexes associated with values of the attributes and similarities in the attributes of the entities; and
sampling the training data to include subsets of the pairs of the entities associated with different ranges of the confidence scores.


17.	(Original) The system of claim 16, wherein the attributes comprise at least one of:
a name;
an address;
a phone number; and
a Uniform Resource Locator (URL).

18.	(Original) The system of claim 14, wherein the features comprise:
a density of connections between a first set of members associated with a first entity and a second set of members associated with a second entity; and
a similarity between one or more attributes of the first entity and one or more corresponding attributes of the second entity.

19.	(Original) The system of claim 14, wherein the additional features comprise at least one of:
a completeness of data for each entity in the pair; 
a number of members of the online system associated with each entity in the pair;
a level of interaction between the members of the online system and each entity in the pair; and
a user annotation of a first entity in the pair as the canonical entity.

20.	(Currently Amended) A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method, the method comprising:
selecting training data for a first machine learning model based on confidence scores representing likelihoods that pairs of entities in an online system are duplicates;
updating parameters of the first machine learning model based on features and labels in the training data;
identifying a first subset of additional pairs of the entities as duplicate entities based on scores generated by the first machine learning model from values of the features for the additional pairs and a first threshold associated with the scores; and
updating content outputted in a user interface of the online system based on the identified first subset of the additional pairs by, for each pair in the first subset of the additional pairs of the entities, determining a canonical entity based on additional features for the pair; and omitting output of a remaining entity that is not the canonical entity in the pair in the user interface of the online system.


Reasons for Allowance
Claims 1, 3-14 and 16-20 are allowed. 
The following is an examiner’s statement of reasons for allowance: 
Regarding claim 1, the claimed invention contains the following underlined features which, when combined with other features of the claim, prior art of record failed to anticipate or render obvious at the time of instant invention was filed:
A method, comprising:
selecting training data for a first machine learning model based on confidence scores representing likelihoods that pairs of entities in an online system are duplicates;
updating, by one or more computer systems, parameters of the first machine learning model based on features and labels in the training data;
identifying, by the one or more computer systems, a first subset of additional pairs of the entities as duplicate entities based on scores generated by the first machine learning model from values of the features for the additional pairs and a first threshold associated with the scores; and
updating, by the one or more computer systems, content outputted in a user interface of the online system based on the identified first subset of the additional pairs by, for each pair in the first subset of the additional pairs of the entities, 
determining a canonical entity based on additional features for the pair; and 
omitting output of a remaining entity that is not the canonical entity in the pair in the user interface of the online system.

Grady et al. (US 2020/0193165, “Grady”) discloses techniques for selectively associating frames with content entities and using such associations to dynamically generate web content related to the content entities by comparing the selected frames to given frames of an entity. Upon comparing detected group of pixels with predefined facial information for the selected frames and determining that a threshold level of similarity exists between the two. Grady can determine that the frame in question contains a depiction of the selected entity.
Kale et al. (US 2020/0334557, “Kale”) discloses techniques for assessing quality of predictions by machine learning models by combining a number of mathematical methods. Kale uses influence functions to estimate the influence of training data points on a particular prediction made by a model in order to explain how the prediction is verified.
Samel et al. (US 2019/0354810, “Samel”) discloses a technique for processing training data for a machine learning model. The technique includes training the machine learning model using training data with a set of features and a set of original labels associated with the set of features.

Regarding claims 14 and 20, the claims contain similar features as recited in claim 1, thus are allowed for the same reason as stated above.
Regarding claims 3-13 and 16-19, these claims depend from claim1 and 14, respectively, and thus are allowed for the same reason stated above for claim 1.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Harry H. Kim whose telephone number is 571-272-5009 and email address is harry.kim2@uspto.gov. The examiner can normally be reached on 9:00a~6:00p.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Derrick Ferris can be reached at 571-272-3123.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (in USA or Canada) or 571-272-1000.

/HARRY H KIM/           Primary Examiner, Art Unit 2411