Low Discrepancy Sets Yield Approximate Min-Wise Independent Permutation Families

31 January 2000

New Image

Motivated by a problem of filtering near-duplicate Web documents, Broder, Charikar, Frienze & Mitzenmacher defined the following notion of epsilon-approximate min-wide independent permutation families. A multiset F of permutations of {0,1,...,n - 1} is such a family if for all K bar subset {0,1,...,n - 1} and any alpha element of K, a permutation tau chosen uniformly at random from F satisfies | Pr[min{tau(K)} = tau (alpha)] - 1 over |K| | = epsilon over |K|.