DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification

The lengthy specification has not been checked to the extent necessary to determine the presence of all possible minor errors. Applicant’s cooperation is requested in correcting any errors of which applicant may become aware in the specification.
Claim Objections
Claims 1-20 objected to because of the following informalities:  Claims 1, 2, 4, 7, 8, 9, 11, 14, 15, 16, and 18 contain bullet points (a), (b), …etc. Claims 3, 5-6, 10, 12-13, 17, and 19-20 are objected due to depending from independent claims 1, 8, and 15. Appropriate correction is required.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1-20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Oprisa (A MinHash Approach for Clustering Large Collections of Binary Programs).
Oprisa teaches:
1. A method comprising:
performing, at one or more computing devices, one or more iterations of a similarity analysis task with respect to a plurality of entities, wherein an iteration of the one or more iterations comprises (pg 159 - Algorithm 1 outlines the clustering process. It starts by creating a set for each sample from the input data (lines 1-3). In the repeat-until loop that follows, a sequence of operations is performed on the samples, until a termination condition is met (lines 4-18):
identifying, from a plurality of sets using a minimum hash based similarity score, a first set and a second set as operands for a set operation, wherein individual ones of the plurality of sets represent one or more entities of the plurality of entities, and wherein the minimum hash based similarity score is obtained without applying a hash function to at least one set of the first and second sets (pg 159 - Inside the loop, the entire set of samples is split into smaller partitions by the function SPLIT-SAMPLES() that is detailed in Algorithm 2 (line 5). For each partition, every pair of samples is considered for similarity (line 7). First, the representatives of the two samples are found (lines 8 and 9). If they have the same representative, it means they already belong to the same cluster so no further step is required for that pair. If not, we need to compute the distance between the two samples (line 11). If the distance is smaller than the threshold θ, we just found two similar samples that belong to different clusters. In this case, the two clusters must be joined together, by performing the UNION operation on their sets (line 13).;
generating, using respective minimum hash information arrays corresponding to the first and second sets and respective contributor count arrays corresponding to the first and second sets, (a) a minimum hash information array of a derived set, wherein the derived set is obtained by applying the set operation to the first and second sets (pg 160 - The algorithm starts by generating a random MinHash function (line 1). As explained in the previous section, all we need to do is select two random integers a and b, in order to have a random permutation. The MinHash function will just compute the minimum value of the features from the given set, after applying the random permutation, as in Equation 5.) and (b) a contributor count array of the derived set, wherein an entry at a particular index in the contributor count array is indicative of a count of child sets of the derived set whose minimum hash information array meets a criterion with respect to an entry at the particular index in the minimum hash information array for the derived set (pg 160 - The algorithm will also need an associative array (line 3) that represents the mapping between MinHash values and the sets of samples that have the same value); and
storing, as part of an input for a subsequent iteration, the generated minimum hash information array and the generated contributor count array (pg 160 - In other words, the set of samples received as the function argument is split into several parts by the MinHash value computed on it. These partitions are processed in lines 8-15, where the function SPLIT-CONDITION() decides whether a partition will be further split or left as it is. The depth parameter of the function contains the number of splits performed so far. At each recursive call (line 10), the depth is bigger. The value for this parameter will be used in the SPLIT-CONDITION() function in order to decide if we must performed a new split); and
providing, from the one or more computing devices after a task termination criterion has been met, an indication of a result of the similarity analysis task (pg 160 - the TERMINATION-CONDITION() function, that decides whether the main loop of the algorithm should stop or perform another iteration).
2. The method as recited in claim 1 (see above rejection), wherein the set operation is one of: (a) a union operation or (b) a set difference operation (pg 159 - UNION(x, y): unification of the two sets containing x
and y).
3. The method as recited in claim 1 (see above rejection), wherein the minimum hash based similarity score is a Jaccard similarity score (pg 157 - The Jaccard distance between two features sets is defined in Equation 1).
4. The method as recited in claim 1 (see above rejection), wherein the minimum hash based similarity score is obtained by dividing a first positive integer by a second positive integer (pg 157 equation 1), the method further comprising performing, at the one or more computing devices:
determining a first number of elements of the contributor count array of the first set for which: (a) the contributor count is non-zero, and (b) a corresponding contributor count of the contributor count array of the second set is also non-zero (pg 160 - The algorithm will also need an associative array (line 3) that represents the mapping between MinHash values and the sets of samples that have the same value. In lines 4-7 the MinHash value for each sample is computed and the sample is inserted in the corresponding set);
determining a second number of elements of the contributor count array of the first set for which (a) the contributor count is zero, and (b) a corresponding minimum hash value of the hash information array of the first set exceeds the corresponding minimum hash value of the hash information array of the second set (pg 160 - The algorithm will also need an associative array (line 3) that represents the mapping between MinHash values and the sets of samples that have the same value. In lines 4-7 the MinHash value for each sample is computed and the sample is inserted in the corresponding set); and
setting the second positive integer to a sum of at least the first number and the second number (pg 158 - The proof considers a matrix with |M| lines and 2 columns. All the features present in the set X will be marked with 1 in the first column and all the features present in the set Y will be marked with 1 in the second column. The Jaccard distance between X and Y will be 1 minus the ratio between the number of lines with two values of 1 and the number of lines with at least a value of 1. The probability that h(X) = h(Y ) is equal with the probability that after applying the permutation σ to the lines of the matrix, the first line with at least a value of 1 has both values 1. This probability is not altered by the permutation).
5. The method as recited in claim 1 (see above rejection), wherein a particular entry of the minimum hash information array of the derived set indicates a bound for a minimum hash value associated with a hash function and the derived set (pg 160 -The best indicator of the algorithm’s progress is the number of UNION operations performed in Algorithm 1, line 13. This operation is only performed if we found two similar samples that belong to different clusters. In this case, the two clusters will be joined and the total number of clusters will decrease by one. Since the number of clusters at the beginning of the algorithm is n (the number of samples) and we will have at least one cluster in the end, this operation is performed at most n − 1 times (this number is much smaller in practice)..
6. The method as recited in claim 5 (see above rejection), wherein the particular entry of the minimum hash information array is stored at a first index within the minimum hash information array(pg 160), the method further comprising performing, at the one or more computing devices:
determining that the bound is to be stored at the first index based at least in part on determining that an entry at the first index in the contributor count array of the derived set indicates a zero contributor count(pg 160 - The algorithm will also need an associative array (line 3) that represents the mapping between MinHash values and the sets of samples that have the same value).
7. The method as recited in claim 1 (see above rejection), further comprising performing, at the one or more computing devices:
obtaining an indication, via a programmatic interface, of one or more parameters of the similarity analysis task, wherein a parameter of the one or more parameters comprises one or more of: (a) a threshold criterion to be used to determine whether a set operation of a particular type is to be performed with respect to a pair of sets, (b) a termination criterion for an iteration, (c) a destination to which the result of the similarity analysis task is to be provided, or (d) an indication of one or more data sources from which information pertaining to the plurality of entities is to be obtained (pg 159 algorithm 1) .
	Claims 8-13 and 15-20 are rejected using similar reasoning in the rejection of claims 1-6 seen above due to reciting similar limitations but directed towards a system and non-transitory computer-accessible storage media.

14. The system as recited in claim 8 (see above rejection), wherein the similarity analysis task comprises one or more of: (a) a classification task, (b) a co-reference resolution task, (c) a nearest neighbor search task, or (d) a generation of a similarity matrix for a kernel method of a support vector machine (pg 157 - The algorithm will comprise of several iterations, where such functions are used to partition the collection of samples into smaller groups, such that elements of the same group are likely to be similar).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Evans et al (US 2021/0004582) and Ertl (SuperMinHash - A New Minwise Hashing Algorithm for Jaccard Similarity Estimation) teaches some of the limitations seen in the independent claims. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAMUEL SHARPLESS whose telephone number is (571)272-1521. The examiner can normally be reached M-F 7:30 AM- 3:30 PM (ET).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, MARK FEATHERSTONE can be reached on (571)270-3750. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.C.S./Examiner, Art Unit 2166                                                                                                                                                                                                        
/MARK D FEATHERSTONE/Supervisory Patent Examiner, Art Unit 2166