COCA filters: co-occurrence aware bloom filters
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Efficient jaccard-based diversity analysis of large document collections
Proceedings of the 21st ACM international conference on Information and knowledge management
Compressed representation of web and social networks via dense subgraphs
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
A novel approach for leveraging co-occurrence to improve the false positive error in signature files
Journal of Discrete Algorithms
Hi-index | 0.00 |
A family of permutations F ⊆ Sn (the symmetric group) is called min-wise independent if for any set X ⊆ [n] and any x ∈ X, when a permutation π is chosen at random in F we have Pr(min{π(X)} = π(x) = 1/|X|. In other words we require that all the elements of any fixed set X have an equal chance to become the minimum element of the image of X under π. The rigorous study of such families was instigated by the fact that such a family (under some relaxations) is essential to the algorithm used by the AltaVista Web indexing software to detect and filter near-duplicate documents. The insights gained from theoretical investigations led to practical changes, which in turn inspired new mathematical inquiries and results. This talk will review the current research in this area and will trace the interplay of theory and practice that motivated it.