Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

2. 	This Office Action is response to Applicants' Non-Final amended Office Action filed on 04/07/2022. Claims 1, 10, and 14 have been amended. Claims 1-20 are pending in this Office Action.

EXAMINER’S AMENDMENT
3.	An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
 	Authorization for this examiner's amendment was given in a telephone interview with Griffin D. Kennedy  (Reg. No.: 76,793) on 05/02/2022 at 801-203-3546.

In claims:

Please replace claims 1, 8, 10, 15, 17 and 18 with the amended claims 1, 8, 10, 15, 17 and 18. 







Amendments to the Claims:

1.		(Currently Amended) A non-transitory computer-readable medium storing instructions thereon that, when executed by at least one processor, cause a computing device to:
determine an estimated amount of overlap between a first set of data samples and a second set of data samples by:
generating, utilizing a sketching algorithm, a first sketch vector comprising a first set of bins for the first set of data samples and a second sketch vector comprising a second set of bins for the second set of data samples by generating, utilizing a one permutation hashing algorithm, a first one permutation hashing vector comprising the first set of bins for the first set of data samples and a second one permutation hashing vector comprising the second set of bins for the second set of data samples;
determining an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator that each provide a value indicating similarity between the first set of data samples and the second set of data samples based on comparisons between the first set of bins of the first sketch vector and the second set of bins of the second sketch vector; and
generating an overlap estimation between the first set of data samples and the second set of data samples utilizing variance metrics corresponding to the equal bin similarity estimator, the lesser bin similarity estimator, and the greater bin similarity estimator; and
provide the overlap estimation for display via a client device in relation to a visual representation of the first set of data samples and a visual representation of the second set of data samples.


8.		(Currently Amended) The non-transitory computer-readable medium of claim 1, further comprising instructions that, when executed by the at least one processor, cause the computing device to determine the equal bin similarity estimator by determining a Jaccard similarity based on the comparisons between the first set of bins of the first sketch vector and the second set of bins of the second sketch vector.

10.		(Currently Amended) A system comprising:
one or more memory devices comprising a first set of data samples, a second set of data samples, and a sketching algorithm; and
one or more server devices configured to cause the system to:
generate, utilizing the sketching algorithm, a first sketch vector comprising a first set of bins for the first set of data samples and a second sketch vector comprising a second set of bins for the second set of data samples by generating, utilizing a one permutation hashing algorithm, a first one permutation hashing vector comprising the first set of bins for the first set of data samples and a second one permutation hashing vector comprising the second set of bins for the second set of data samples;
determine an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator that each provide a value indicating similarity between the first set of data samples and the second set of data samples by comparing the first set of bins of the first sketch vector and the second set of bins of the second sketch vector to determine whether a bin value of a given bin from the first set of bins is equal to, less than, or greater than a bin value of a corresponding bin from the second set of bins;
determine an equal bin variance metric indicating a measure of variance corresponding to the equal bin similarity estimator, a lesser bin variance metric indicating a measure of variance corresponding to the lesser bin similarity estimator, and a greater bin variance metric indicating a measure of variance corresponding to the greater bin similarity estimator; and
generate an overlap estimation between the first set of data samples and the second set of data samples from the equal bin similarity estimator, the lesser bin similarity estimator, or the greater bin similarity estimator based on comparing the equal bin variance metric, the lesser bin variance metric, and the greater bin variance metric.

15.		(Currently Amended) The system of claim 10, wherein the one or more server devices are further configured to generate, utilizing the sketching algorithm, the first sketch vector comprising the first set of bins for the first set of data samples and the second sketch vector comprising the second set of bins for the second set of data samples by: 
populating the first set of bins with bin values corresponding to a first distribution segment trait; and
populating the second set of bins with bin values corresponding to a second distribution segment trait.

17.		(Currently Amended) A computer-implemented method for efficiently determining amounts of overlap between digital data repositories comprising:
determining an estimated amount of overlap between a first set of data samples and a second set of data samples by:
generating, utilizing a sketching algorithm, a first sketch vector comprising a first set of bins for the first set of data samples and a second sketch vector comprising a second set of bins for the second set of data samples by generating, utilizing a one permutation hashing algorithm, a first one permutation hashing vector comprising the first set of bins for the first set of data samples and a second one permutation hashing vector comprising the second set of bins for the second set of data samples;
determining an equal bin similarity estimator, a lesser bin similarity estimator, and a greater bin similarity estimator that each provide a value indicating similarity between the first set of data samples and the second set of data samples based on comparisons between the first set of bins of the first sketch vector and the second set of bins of the second sketch vector; and
generating an overlap estimation between the first set of data samples and the second set of data samples utilizing variance metrics corresponding to the equal bin similarity estimator, the lesser bin similarity estimator, and the greater bin similarity estimator; and
providing the overlap estimation for display via a client device in relation to a visual representation of the first set of data samples and a visual representation of the second set of data samples. 

18.		(Currently Amended) The computer-implemented method of claim 17, wherein generating the first sketch vector for the first set of data samples and the second sketch vector for the second set of data samples comprises generating the first one permutation hashing vector and the second one permutation hashing vector to include hash values corresponding to one or more distribution segment traits.











Allowable Subject Matter
4. 	Claims 1-20 are allowed.
	The closest prior art, US Patent Publication No.  2016/0189186 A1 of Fabrikant et al. (hereinafter Fabrikant) teaches computer-implemented methods and systems of determining semantic place data include receiving a plurality of location data reports from a plurality of mobile devices, partitioning them into localized segments, and estimating a geographic region bucket for each segment; wherein the closest prior art, US Patent Publication No. 2017/0091795 A1 of Mansour et al. (hereinafter Mansour) teaches a method identify local trade areas, an example method includes selecting, with a processor, census block groups associated with a retailer location, identifying, with the processor, a plurality of stores within the selected CBGs and associated all commodities volume values for respective ones of the plurality of stores, calculating, with the processor, similarity index values associated with respective pairs of the plurality of stores, generating, with the processor, local trade areas (LTAs) of subgroups of the plurality of stores based on a comparison of the similarity index values to a similarity threshold value; wherein the closest prior art, US Patent Publication No.  2018/0181609 A1 of Chen et al. (hereinafter Chen) teaches a method for de-duplicating electronic job postings are provided. In one embodiment, a method includes obtaining a first set of data indicative of a job posting. The first set of data includes one or more characteristics associated with the job posting. The method includes accessing a second set of data indicative of a job posting cluster. The job posting cluster includes one or more previous job postings; wherein the closest prior art, US Patent Publication No.  2021/0034914 A1 of Bansal (hereinafter Bansal) teaches a method detecting an emergency vehicle, a plurality of images may be taken from a perspective of an autonomous vehicle. One or more gates representing a region of interest at a respective distance from the vehicle may be generated for the images. A plurality of lights may be detected within the one or more gates.
Also, Fabrikant, Mansour, Chen, and Bansal fail to teach generating, utilizing a sketching algorithm, a first sketch vector comprising a first set of bins for the first set of data samples and a second sketch vector comprising a second set of bins for the second set of data samples by generating, utilizing a one permutation hashing algorithm, a first one permutation hashing vector comprising the first set of bins for the first set of data samples and a second one permutation hashing vector comprising the second set of bins for the second set of data samples.
However, the prior arts of record such as  Fabrikant, Mansour, Chen, and Bansal do not teach or fairly suggest the steps as determine an equal bin variance metric indicating a measure of variance corresponding to the equal bin similarity estimator, a lesser bin variance metric indicating a measure of variance corresponding to the lesser bin similarity estimator, and a greater bin variance metric indicating a measure of variance corresponding to the greater bin similarity estimator; and generate an overlap estimation between the first set of data samples and the second set of data samples from the equal bin similarity estimator, the lesser bin similarity estimator, or the greater bin similarity estimator based on comparing the equal bin variance metric, the lesser bin variance metric, and the greater bin variance metric.

The dependent claims bring definite, further limiting, and fully enable by the specification are also allowed.

5. 	        Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee. Such submissions should be clearly labeled "Comments on Statement of Reasons for Allowance."

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Hwa whose telephone number is 571-270-1285. The examiner can normally be reached on 9:00 am – 5:30 pm EST. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tamara Kyle can be reached on 571-272-4241. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. 
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only, for more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the PAIR system contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
05/02/2022	
								
/SHYUE JIUNN HWA/
Primary Examiner, Art Unit 2156