DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
 

Allowable Subject Matter
Claims 1-20 are allowable over the prior art.  However, the claims remain rejected under 35 USC §101.



Claim Rejections – 35 U.S.C. § 101
35 U.S.C. § 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. § 101 because the claimed invention is directed to non-statutory subject matter.

These claims are rejected under 35 USC §101 because the claimed invention is directed to an abstract idea without significantly more.  The claim recites at a very high level mathematically pre-processing / manipulating individual data items, then calculating a similarity value between two such data items.  Thus, the claims recite a series of mathematical steps that are not tied to a practical application. 


Regarding independent claim 1: 
Statutory Category:  Yes, recites a series of steps executed (therefore a process).
 
Step 2A, Prong 1 (Judicial Exception Recited?):  Yes.  The claim recites a series of steps including scaling weights of features (i.e., “adjusting”, e.g., multiplying by a number such as “1”), expanding such weighted/scaled features into expanded sets (i.e., grouping), performing a minhash operation over expanded sets (i.e., incrementally/iteratively hashing over data sets); performing minhash iterations (i.e., perform the hashing/mathematical operations until a termination condition is reached), and then calculate a Jaccard similarity value between two objects (i.e., calculate a similarity value between objects/items).  The claim encompasses performing mathematical operations on data, culminating in a mathematical comparison of values representative of data objects.  These concepts, under a broadest reasonable interpretation, encompass the solving of a math problem (i.e., using a multi-step mathematical concept for manipulating data elements).  Use of a mathematical concept integrated into a practical application may represent patent eligible subject matter, but the mere solving of a math problem is considered an abstract idea.
It is further noted that generic hardware is also claimed.  
For example, the claim limitation directed to “scaling …” merely encompasses the mathematically adjusting a [weight] value.  Further, “expanding …”, appears to refer to the grouping of like-valued data.  Minhashing is a mathematical operation, and the limitations directed to “minhash[ing] …” and “iterat[ing] …” are merely further directed to the mathematical processing of data in preparation for a final step of “calculat[ing]” a Jaccard similarity index value to enable a similarity comparison of resulting values.  For example, See Applicant’s specification at [0006] discussing these terms/steps as mathematical procedures.  
If a claim limitation, under its broadest reasonable interpretation, covers performance of mathematical calculations or relationships but for the recitation of generic components, then it falls within the “Mathematical Concepts” grouping of abstract ideas.  
Other than reciting additional generic elements, such as processors and storage, nothing in the claim precludes its characterization as a mathematical concept.  For example, the claim encompasses the performance of mathematical operations/steps based upon mathematical relationships to transform data values and subsequently compare those transformed values.  These limitations are therefore reasonably characterized as encompassing mathematical concepts (i.e., an abstract idea).  
Accordingly, the claim recites an abstract idea.  I.e., these limitations encompass a mathematical concept (an abstract idea).  

Step 2A, Prong 2 (Integrated into a Practical Application?):  No.  The claim recites a series of steps directed to associating a mathematically derived value with data objects in order to assess their similarity.  Under a broadest reasonable interpretation, other than reciting additional generic computing elements, such as processors and storage, nothing in the claim precludes characterization as a mathematical concept.  
The computing elements are recited at a high-level of generality such that the claim amounts to no more than mere instructions to apply the exception using generic computer components.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose meaningful limits on practicing the abstract idea.  Therefore, the claim is directed to an abstract idea. 

Step 2B (Inventive Concept Provided?):  No.  As discussed with respect to Step 2A, the elements (i.e., steps of scaling, expanding, minhashing, iterating and calculating) in the claim amount to no more than mere instructions to apply the exception.  Mere instructions to apply an exception using generic computer components (e.g., a processor and memory) cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.  
Therefore, the claim is not patent eligible, and is reasonably rejected under 35 USC §101.  


Claims 2-10 depend upon claim 1, and do not correct the issues set forth above.  Dependent claims 2-10 also recite further mathematical operations (calculations and/or manipulation/preprocessing of data / data values), and therefore are also rejected as claiming abstract, mathematical concepts.  


Independent claims 11 and 20 are each substantially similar to claim 1.  Therefore, these claims are likewise rejected.  


Claims 2-10 and 12-19 depend upon claims 1 and 11, respectively, and do not correct the issues set forth above.   Therefore, these claims are likewise rejected.  


Therefore, the claims are not patent eligible, and were reasonably rejected under 35 USC §101.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.  Relevance is provided in at least the Abstract of each cited document.

Non-Patent Literature
Shameem, Mushfeq-Us-Saleheen, et al., “An efficient K-Means Algorithm integrated with Jaccard Distance for Document Clustering”, AH-ICI 2009, Kathmandu, Nepal, November 3-5, 2009, 6 pages.
The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for comparing the similarity and diversity of sample sets.  The Jaccard coefficient measures similarity between sample sets, and is defined as the size of the intersection divided by the size of the union of sample sets.  The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient. (page 2, 1st paragraph under B. Jaccard Distance Measure). 

Haeupler, Bernhard, et al., “Consistent Weighted Sampling Made Fast, Small, and Easy”, arXiv, Cornel University archive, document ID:  arXiv:1410.4266v1, October 16, 2014, pp. 1-18.
A simple randomized reduction that transforms any weighted set to an unweighted one, such that the Jaccard similarity between sets is approximately preserved. (page 4, 1st paragraph under 3.1 Weighted to Unweighted Reduction).  

Manasse, Mark, et al., “Consistent Weighted Sampling”, Microsoft Technical Report, published June 2, 2010, 13 pages.
An efficient procedure for sampling representatives from a weighted set such that for any weightings S and T, the probability that the two choose the same sample is equal to the Jaccard similarity between them. (page 2, Abstract).

Ioffe, Sergey, “Improved Consistent Sampling, Weighted Minhash and L1 Sketching”, ICDM 2010, Sydney, NSW, Australia, December 13-17, 2010, pp. 246-255.
A new Consistent Weighted Sampling method, where the probability of drawing identical samples for a pair of inputs is equal to their Jaccard similarity.  The samples can be used as Weighted Minhash for efficient retrieval and compression [sketching] under Jaccard or L1 metrics.  A novel method of mapping hashes to short bit-strings, applied to Weighted Minhash, providing more accurate distance measurements that existing methods.  (page 246, Abstract).



US Patent Application Publications
Chandola 	 				2016/0269424
The MinHash vector comparison described above can be used as a filter used before doing more expensive set similarity computations (such as exact unweighted or weighted Jaccard Index computations, a cosine distance computation, a learned distance computation, or some other metric). In particular, set pairs that do not have an intersection, would have a zero score in similarity using the MinHash vector comparison, and thus could be filtered out such that exact similarities do not need to be computed. When embodiments obtain pairs that had a match in at least one (or some other predetermined threshold) MinHash, then, embodiments use these pairs for an exact set intersection. Thus, after the MinHash filtering, embodiments can then compute an exact similarity on remaining sets. (para 0050).  

Li 	 				2013/0151531
In general, the shingle-article matrix M may not fit into some memory, as the number of articles tends to be substantial. To alleviate this issue, the systems and methods can use a "Minhashing" technique to generate a succinct signature for each column in M, such that the probability of two articles having the same signature is equal to the Jaccard index similarity between the two articles. More particularly, the systems and methods can construct a Minhash signature of length-100, or other amounts, using one or more known techniques. In some cases, the randomized nature of the Minhash generation method can require further checks to increase the probability of uncovering all pairs of related articles in terms of the signature. Thus, the systems and methods utilize LSH to increase such probability. (para 0037).

Lysne 	 				2015/0154192
In the method, the act of executing the precision phase may include calculating a similarity metric using a similarity function that references the probe vector and a vector corresponding to the at least one candidate object, and determining whether the similarity metric transgresses the predefined similarity threshold. In one embodiment, the act of calculating the similarity metric may include combining Cosine similarity and a Jaccard index. In another embodiment, the method may include the act of executing an anti-aliasing phase using a meta-store. (para 0080).  The Jaccard index for two sets A and B is defined as the intersection/union ratio and measures the overlap between vector dimensions; the more overlap the higher the similarity. (para 0199).  Generating the at least one internal representation may include calculating at least one normalized value of at least a portion of the information that specifies a weight of the at least one feature.  (paras 0022, 0026).  the object manager normalizes the weights of the internal representations created in act 3908. In one embodiment, the object manager encodes each weightn as magn using a scaling and normalization function such that the maximum weight maps to a predefined maximum (e.g., 2.sup.6-1=63) and the minimum weight maps to a predefined minimum (para 0191).  The computing system of claim 9, wherein generating the one or more internal vectors representative of the probe object based on the one or more external vectors representative of the probe object comprises: applying a hashing function to each feature of the plurality of features of at least one external vector representative of the probe object; applying at least one of a scaling function and a normalization function to each weight associated with each feature of the plurality of features of the at least one external vector representative of the probe object; and wherein the one or more internal vectors representative of the probe object are represented in a fixed number of bits, and wherein the corresponding one or more external vectors representative of the probe object are represented in a variable number of bits (claim 10).  In some embodiments, the query engine is configured to implement quick and highly scalable object lookup by fingerprint during the query recall phase through the permutation indices. (Fig. 21, para 0237).

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Robert Stevens, whose telephone number is (571) 272-4102.  The examiner can normally be reached on M-F 6:00 – 2:30.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on (571) 272-0631.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ROBERT STEVENS/Primary Examiner, Art Unit 2164                                                                                                                                                                                                        




July 16, 2022