DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . This action is responsive to the Request for Continued Examination filed on 02/18/2022, which refers to the Amendment filed on 02/08/2022. Claims 1-20 are pending in the case. Claims 1, 8, and 14 are independent claims.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 02/08/2022 has been entered.
 
Response to Arguments
Applicant's prior art arguments have been fully considered and are persuasive. Accordingly, these rejections are hereby withdrawn.

EXAMINER’S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in an interview with Nilesh Amin on 05/11/2022.
Claims 1, 8, and 14 have been amended as follows:
1. (Currently Amended) A system, comprising:
a memory that stores computer executable components;
a processor, operably coupled to the memory, and that executes the computer executable components stored in the memory, wherein the computer executable components comprise:
a similarity component that determines respective similarity scores of a similarity metric between entities of a dataset based on a vector space model; and
a sampling component that generates training data to train a machine learning model to predict whether a query assertion regarding the dataset is true or false, wherein the training data comprises a positive sample assertion, and a negative sample assertion generated from the positive sample assertion based on performing a negative sampling analysis on the dataset based on the similarity metrics, comprising:
wherein the positive sample assertion comprises a first entity of the dataset, a second entity of the dataset, and a defined relationship between the first entity and the second entity, wherein the defined relationship is true, and
substituting the first entity of the positive sample assertion with a third entity of the dataset to generate the negative sample assertion, wherein the third entity has a similarity score for the similarity metric with respect to the first entity exceeding a predefined threshold, the defined relationship is false between the second entity and the third entity, and the first entity, the second entity, and the third entity are distinct.

2. (Previously Presented) The system of claim 1, wherein the sampling component generates a plurality of negative sample assertion from a plurality of positive sample assertions based on performing the negative sampling analysis.

3. (Previously Presented) The system of claim 2, further comprising:
a probability component that determines a probability that replacing the first entity with the third entity will result in the defined relationship between the second entity and the third entity being false.

4. (Previously Presented) The system of claim 1, further comprising:
a model component that populates the vector space model with respective estimated vectors for the entities using a deep learning algorithm.

5. (Previously Presented) The system of claim 1, wherein the similarity metrics are a type of metric selected from a group consisting of a cosine metric, an euclidean metric, and a manhattan metric.

6. (Previously Presented) The system of claim 1, further comprising:
a machine learning component that derives, using the machine learning model, a lesson from the dataset based on the entities and the negative sample assertion.

7. (Previously Presented) The system of claim 1, wherein the memory is comprised within a cloud computing environment.

8. (Currently Amended) A computer-implemented method, comprising:
determining, by a system operatively coupled to a processor, respective similarity scores of a similarity metric between entities of a dataset based on a vector space model; and
generating, by the system, training data to train a deep learning algorithm to predict whether a query assertion regarding the dataset is true or false, wherein the training data comprises a positive sample assertion, and a negative sample assertion generated from the positive sample assertion based on performing a negative sampling analysis on the dataset based on the similarity metrics, comprising:
wherein the positive sample assertion comprises a first entity of the dataset, a second entity of the dataset, and a defined relationship between the first entity and the second entity, wherein the defined relationship is true, and
substituting the first entity of the positive sample assertion with a third entity of the dataset to generate the negative sample assertion, wherein the third entity has a similarity score for the similarity metric with respect to the first entity exceeding a predefined threshold, the defined relationship is false between the second entity and the third entity, and the first entity, the second entity, and the third entity are distinct.

9. (Previously Presented) The computer-implemented method of claim 8, wherein the generating further comprises generating a plurality of negative sample assertion from a plurality of positive sample assertions based on performing the negative sampling analysis.

10. (Previously Presented) The computer-implemented method of claim 8, further comprising:
determining, by the system, a probability that replacing the first entity with the third entity will result in the defined relationship between the second entity and the third entity being false.

11. (Previously Presented) The computer-implemented method of claim 8, further comprising:
populating, by the system, the vector space model with respective estimated vectors for the entities using a deep learning algorithm.

12. (Previously Presented) The computer-implemented method of claim 8, wherein the similarity metrics are a type of metric selected from a group consisting of a cosine metric, an euclidean metric, and a manhattan metric.

13. (Previously Presented) The computer-implemented method of claim 8, further comprising:
deriving, by the system, using the deep learning algorithm, a lesson from the dataset based on the entities and the negative sample assertion.

14. (Currently Amended) A computer program product for similarity based negative sampling, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to:
determine, by the processor, respective similarity scores of a similarity metric between entities of a dataset based on a vector space model; and
generate, by the processor, training data to train a neural net to predict whether a query assertion regarding the dataset is true or false, wherein the training data comprises a positive sample assertion, and a negative sample assertion generated from the positive sample assertion based on performing a negative sampling analysis on the dataset based on the similarity metrics, comprising:
wherein the positive sample assertion comprises a first entity of the dataset, a second entity of the dataset, and a defined relationship between the first entity and the second entity, wherein the defined relationship is true, and
substituting the first entity of the positive sample assertion with a third entity of the dataset to generate the negative sample assertion, wherein the third entity has a similarity score for the similarity metric with respect to the first entity exceeding a predefined threshold, the defined relationship is false between the second entity and the third entity, and the first entity, the second entity, and the third entity are distinct.

15. (Previously Presented) The computer program product of claim 14, wherein the generating further comprises generating a plurality of negative sample assertion from a plurality of positive sample assertions based on performing the negative sampling analysis.

16. (Previously Presented) The computer program product of claim 15, wherein the program instructions further cause the processor to:
determine, by the processor, a probability that replacing the first entity with the third entity will result in the defined relationship between the second entity and the third entity being false.

17. (Previously Presented) The computer program product of claim 15, wherein the program instructions further cause the processor to:
populate, by the processor, the vector space model with respective estimated vectors for the entities using a deep learning algorithm.

18. (Previously Presented)The computer program product of claim 15, wherein the similarity metrics are a type of metric selected from a group consisting of a cosine metric, an euclidean metric, and a manhattan metric.

19. (Previously Presented) The computer program product of claim 15, wherein the program instructions further cause the processor to:
derive, by the processor, using the neural net, a lesson from the dataset based on the entities and the negative sample assertion.

20. (Previously Presented) The computer program product of claim 15, wherein the processor performs the negative sampling analysis in a cloud computing environment.

Allowable Subject Matter
Claims 1-20 are allowed as amended above. 
The following is an examiner’s statement of reasons for allowance:
The prior art of record does not teach, make obvious, or suggest the claim limitations of independent claims 1, 8, and 14, as summarized below.
Osesina et al. (U.S. Pat. App. Pub. No. 2021/0240682) teaches entity resolution which involves the analysis and identification of datasets that refer to the same real world entity (e.g., person). For example, some individuals may have numerous public records that identify the individual differently. As a simple illustration, several records associated with Jane Doe at her current home address may exist: one record with her name listed as J. Doe, a second record as Jane H. Doe, a third record as Doe, Jane, and a fourth record as Jan Doe (a misspelling). Conceivably, a bulk mailing could unwittingly send multiple mailings to Jane Doe at her current address, one for each name variation. The entity feature constructor includes an entity representation model to convert or map each entity into a vector. The entity relation predictor then makes entity relation predictions based on the mapped vectors and the similarity scores between entities. The approach takes advantage of limited user input on records that are known to belong to the same entity to generate both positive and negative labels for the link features. In particular, the approach generates linkage features labelled “same entity” by using records from the same entity. In other words, the approach successively applies the similarity metrics in the link features to all the record pairs in an entity. A “different entities” label is generated by comparing records in an entity to those in a different entity.
Tan et al. (U.S. Pat. App. Pub. No. 2018/0025008) teaches homogeneous entity grouping. Entities that have a similar meaning are identified and grouped into entity groups for knowledge base construction. The entity relations of similarity or non-similarity for an entity pair are predicted as a binary relationship. A group centroid entity may be selected to represent each group based on one or more factors, such as its generality or popularity and a negative entity pair can be extracted across two predicted groups.
In contrast, in the disclosed invention, as recited in independent claims 1, 8, and 14, requires a sampling component that generates training data to train a machine learning model to predict whether a query assertion regarding the dataset is true or false, wherein the training data comprises a positive sample assertion, and a negative sample assertion generated from the positive sample assertion based on performing a negative sampling analysis on the dataset based on the similarity metrics, comprising: wherein the positive sample assertion comprises a first entity of the dataset, a second entity of the dataset, and a defined relationship between the first entity and the second entity, wherein the defined relationship is true, and substituting the first entity of the positive sample assertion with a third entity of the dataset to generate the negative sample assertion, wherein the third entity has a similarity score for the similarity metric with respect to the first entity exceeding a predefined, the defined relationship is false between the second entity and the third entity, and the first entity, the second entity, and the third entity are distinct.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.” 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Casey R. Garner whose telephone number is 571-272-2467. The examiner can normally be reached on Monday to Friday, 8am to 5pm, Eastern Time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/Casey R. Garner/Examiner, Art Unit 2123