DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Claims 1-28 are rejected under 35 U.S.C. 103(a) as being unpatentable over Ormat Ertl (KDD 2008. BagMinHash-Minwise Hashing Algorithm for Weighted Sets. Research Track Paper. 8/2018) in view of Brkic (UK Pat. App. 2540562).

Regarding claim 1, Ertl discloses a computer implemented method of improving similarity distance approximation between data sets, comprising: 
receiving a data set (p. 1369, section 1.3 – Application, receiving data); 
defining a signature for the data set, where the signature is an array of values concatenated together and each element in the signature array has an array index value (p. 1369, section 2, signature component for hash values); and 
extracting a group of data features from the data set, where each data feature has an associated nonnegative weight and the weight is discretized in a particular interval in a set of intervals using a discretization method (p. 1369, section 1.3 – Application: extraction), such that the particular interval is between a lower bound and an upper bound and length of intervals in the set of intervals varies (p. 1369, 1st para. 2 column); and 
for a given data feature in the group of data features and for each interval in the set of intervals with a lower bound that is lower than weight associated with the given data feature, seeding a pseudorandom number generator based in part on value for the given data feature and in part on value of discretization index for a given interval in the set of intervals (2.2 random numbers) and updating a value in a given element of the signature array using the pseudorandom number generator, wherein the value for the given element of the signature array is updated (p. 1371, 1st column) by 
defining an exponential distribution with a rate parameter set to length of the given interval (p. 1371, 1st column); 
selecting the update value for the given element of the signature array from the exponential distribution using at least one random number from the pseudorandom number generator (pp. 1370-1371); 
retrieving a value from the given element of the [signature] array (p. 1372, 2nd column: array); 
comparing the updated value to the value retrieved from the given element (p. 1372); and 
updating the value retrieved from the given element with the updated value when the updated value is less than the value retrieved from the given element of the signature array (p. 1372, 2nd column: update).
Ertl does not explicitly disclose “signature array;” however, Brkic discloses “signature array” (p. 8, lines 11-30; updating signature array value).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Brkic into Ertl to combine with the existing signature for the data array region to provide an updated signature with the appropriate modification to the existing signature value (Brkic, p. 8, lines 20-24).

Regarding claim 2, Ertl in view of Brkic disclose the method of claim 1 further comprises updating values in each element of the signature array using the pseudorandom number generator according to a predefined sequence of the signature array (section 2.2: random generator).

Regarding claim 3, Ertl in view of Brkic disclose the method of claim 1 wherein selecting the update value for the given element of the signature array further comprises fetching a first pseudo random number from the pseudo random number generator, where the pseudo random numbers generated by the pseudo random number generator follow a given distribution given by the pseudo random number generator, transforming the first pseudo random number into a second pseudo random number that follows an exponential distribution using a transformation method and using the second pseudo random number as the update value for the given element (section 2.2: random generator).

Regarding claim 4, Ertl in view of Brkic disclose the method of claim 1 wherein updating a value in a given element of the signature array further comprises randomly selecting an array index value and updating the value of the element in the signature array corresponding to the selected array index value (Brkic, p. 8).

Regarding claim 5, Ertl in view of Brkic disclose the method of claim 4 further comprises selecting the update value for the given element of the signature array such that the update value increases each iteration (p. 1371, 2nd column iterations of values).

Regarding claim 6, Ertl in view of Brkic disclose the method of claim 5 wherein updating a value in a given element of the signature array further comprises repeating the steps of randomly selecting an array index value and updating the value of the element in the signature array corresponding to the selected array index value until the update value is greater than highest value in the signature array (p. 1371, Therefore, as soon as some point is greater than hmax the Poisson process can be stopped. If hmax is simultaneously updated each time some signature value hi is replaced by a lower value, hmax will decrease over time and the termination condition is satisfied earlier in subsequent iterations of d and l).

Regarding claim 7, Ertl in view of Brkic disclose the method of claim 1 wherein the extracted data feature is one of a severity indicator for a logged event and a text message that describes the logged event.

Regarding claim 8, Ertl in view of Brkic disclose the method of claim 1 further comprises computing a similarity measure for the data set using the signature for the data set (section 1.1); comparing the similarity measure for the data set to a similarity measure for another data set (section 1.1); and updating a metric describing performance of the computer system based upon the comparison (section 1.1).

Regarding claim 9, Ertl in view of Brkic disclose the method of claim 8 wherein comparing the similarity measure further comprises estimating a Jaccard index from the signature for the data set and the signature for another data set (section 1.1, Jaccard estimates for small n compared to m).

Regarding claim 10, Ertl discloses a computer implemented method of improving similarity distance approximation between data sets, comprising: 
receiving a data set (p. 1369, section 1.3 – Application, receiving data);
defining a signature for the data set, where the signature is an array of values concatenated together and each value in the array has an array index value (p. 1369, section 2, signature component for hash values); and 
extracting a group of data features from the data set, where each data feature has an associated weight and the weight is discretized in a particular interval in a set of intervals using a discretization method and each interval in the set of intervals is defined by a lower bound and an upper bound (2.2 random numbers) (p. 1371, 1st column);
 for a given data feature in the group of data features and for each interval in the set of intervals with a lower bound that is lower than the weight associated with the given data feature, 
a) initializing value of an accumulator (p. 1374); 
b) seeding a pseudorandom number generator based in part with a value of the given data feature (section 2.2: seeding); 
c) generating a random number from an exponential distribution using the pseudorandom number generator (section 2.2: random number);
 d) generating an update value for the signature by summing the random number with the value of the accumulator (section 2.2);
 e) generating an array index value for the signature by randomly selecting the array index value from a uniform distribution of array index values for the signature (section 2.2, uniform distributed); 
f) retrieving a value from the signature at the array index value (Brkic, p. 8); 
g) comparing the updated value to the value retrieved from the signature (Brkic p. 8); and 
h) updating the value retrieved from the signature with the updated value when the updated value is less than the value retrieved from the signature (Brkic, p. 8); 
repeating step a) -h) for each data feature in the group of data features (Brkic, p. 8).
Ertl does not explicitly disclose “signature array;” however, Brkic discloses “signature array” (p. 8, lines 11-30; updating signature array value).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Brkic into Ertl to combine with the existing signature for the data array region to provide an updated signature with the appropriate modification to the existing signature value (Brkic, p. 8, lines 20-24).


Regarding claim 11, Ertl in view of Brkic disclose the method of claim 10 further comprises defining an exponential distribution with a rate parameter set to length of the corresponding interval, where length of intervals in the set of intervals varies (p. 1373, range/size of index).

Regarding claim 12, Ertl in view of Brkic disclose the method of claim 10 further comprises performing step c) and d) using a Poisson process (Section 3.1).

Regarding claim 13, Ertl in view of Brkic disclose the method of claim 1 further comprises maintaining a maximum value in the signature and repeating steps c) - h) for the given data feature until the updated value exceeds the maximum value in the signature (pp. 1370-1371).

Regarding claim 14, Ertl in view of Brkic disclose the method of claim 13 further comprises maintaining a maximum value in the signature using a binary tree (p. 1371, last 2 paragraphs).

Regarding claim 15, Ertl in view of Brkic disclose the method of claim 10 wherein the extracted data feature is one of a severity indicator for a logged event and a text message that describes the logged event (p. 1368).
Regarding claim 16, Ertl in view of Brkic disclose the method of claim 10 further comprises computing a similarity measure for the data set using the signature for the data set (section 1.1); comparing the similarity measure for the data set to a similarity measure for another data set (section 1.1); and updating a metric describing performance of the computer system based upon the comparison (section 1.1).

Regarding claim 17, Ertl in view of Brkic disclose the method of claim 16 wherein comparing the similarity measure further comprises estimating a Jaccard index from the signature for the data set and the signature for another data set (section 1.1: Jaccard estimation).
Regarding claim 18, Ertl discloses a computer implemented method of improving similarity distance approximation between data sets, comprising: 
defining a signature for a data set, where the signature is an array of values concatenated together and each value in the array has an array index value (p. 1369, section 2, signature component for hash values); 
receiving a given data feature extracted from the data set, where the given data feature has an associated weight and the weight is discretized in a particular interval in a set of discretization intervals using a discretization method, wherein each discretization interval in the set of discretization intervals has an interval index and is defined by a lower bound and an upper bound (2.2 random numbers) (p. 1371, 1st column); 
seeding a pseudorandom number generator with a value of the given data feature (section 2.2); 
defining a parent Poisson process that represents a discretization range for possible values of the weight, where a rate parameter for the parent Poisson process is proportional to the discretization range and the parent Poisson process is defined using the pseudorandom number generator (section 2.2);
 generating a random value as a signature candidate value using the parent Poisson process (section 3.1); 
a) splitting the parent Poisson process into at least two child Poisson processes by assigning a portion of the discretization range of the parent Poisson process to each of the at least two child Poisson processes, such that the entirety of the discretization range is assigned to the at least two child Poisson processes and discretization ranges represented by each of the at least two child Poisson process do not overlap each other and boundaries of the discretization ranges represented 60 by each of the at least two child Poisson process align with a boundary of a discretization interval in the set of discretization intervals (section 3.1, Poisson Process); 
b) randomly selecting to which of the at least two child Poisson processes the signature candidate value belong to (section 3.1, Poisson Process); 
c) repeating steps a) and b) using the selected child Poisson process as the parent Poisson process until the discretization range represented by the selected child Poisson process corresponds in size to only one discretization interval in the set of discretization intervals (section 3.1, Poisson Process); 
d) comparing a boundary of the discretization range represented by the selected child Poisson process to the weight associated with the given data feature (section 3.1, Poisson Process); and 
e) updating an element of the signature using the signature candidate value in response to the weight associated with the given data feature being larger than the boundary of the discretization range represented by the selected child Poisson process (section 3.1, Poisson Process).
Ertl does not explicitly disclose “signature is an array value;” however, Brkic discloses “signature is an array value” (p. 8, lines 11-30; updating signature array value).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to incorporate Brkic into Ertl to combine with the existing signature for the data array region to provide an updated signature with the appropriate modification to the existing signature value (Brkic, p. 8, lines 20-24).

Regarding claim 19, Ertl in view of Brkic disclose the method of claim 18 wherein the random selection of the at least two child Poisson processes is biased by size of the discretization range represented by each of the at least two child Poisson processes (section 3.1, Poisson Process).

Regarding claim 20, Ertl in view of Brkic disclose the method of claim 18 further comprises, for each of the at least two child Poisson processes not selected in step b), generating a random value by the unselected child Poisson process and storing the unselected child Poisson process in a buffer, where the buffer is organized in ascending order according the random values of the unselected child Poisson processes (section 3.1, Poisson Process).

Regarding claim 21, Ertl in view of Brkic disclose the method of claim 20 further comprises f) fetching a Poisson process with the lowest random value from the buffer and performing steps a) - e) with the fetched Poisson process, where the random value of the fetched Poisson process is the signature candidate value (section 3.1, Poisson Process).

Regarding claim 22, Ertl in view of Brkic disclose the method of claim 21 further comprises repeating steps a) and b) using the selected child Poisson process as the parent Poisson process until the discretization range represented the selected child Poisson process corresponds in size to an interval in the set of intervals or the weight is less than the lower bound of the discretization range of the selected child Poisson process (section 3.1, Poisson Process).

Regarding claim 23, Ertl in view of Brkic disclose the method of claim 20 wherein updating an element of the signature further comprises randomly selecting the element of the signature on which the update is performed (Brkic, p. 8).

Regarding claim 24, Ertl in view of Brkic disclose the method of claim 21 further comprises maintaining a maximum value in the signature and repeating steps a) - f) until the candidate value exceeds the maximum value in the signature (pp. 1370-1371).

Regarding claim 25, Ertl in view of Brkic disclose the method of claim 18 wherein the given data feature is one of a severity indicator for a logged event and a text message that describes the logged event (p. 1368).

Regarding claim 26, Ertl in view of Brkic disclose the method of claim 18 further comprises computing a similarity measure for the data set using the signature for the data set (section 1.1); comparing the similarity measure for the data set to a similarity measure for another data set (section 1.1); and updating a metric describing performance of the computer system based upon the comparison (section 1.1).

Regarding claim 27, Ertl in view of Brkic disclose the method of claim 26 wherein comparing the similarity measure further comprises estimating a Jaccard index from the signature for the data set and the signature for another data set (section 1.1: Jaccard estimation).

Regarding claim 28, Ertl in view of Brkic disclose the method of claim 18, where step c) further comprises using the sizes of the discretization ranges assigned to the child Poisson processes to perform a random selection that selects a given child Poisson process with a probability that depends on the size of the discretization range assigned to the given child Poisson process (see: section 3.1, Poisson Process).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TUANKHANH D PHAN whose telephone number is (571)270-3047.  The examiner can normally be reached on Mon-Fri, 10:00am-18:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 or 571-272-1000.
/TUANKHANH D PHAN/               Examiner, Art Unit 2154