Probabilistic counting algorithms for data base applications
Journal of Computer and System Sciences
Recursive hashing functions for n-grams
ACM Transactions on Information Systems (TOIS)
Hardware-assisted algorithm for full-text large-dictionary string matching using N-gram hashing
Information Processing and Management: an International Journal
Massive query resolution for rapid selective dissemination of information
Journal of the American Society for Information Science
An n-gram hash and skip algorithm for finding large numbers of keywords in continuous text streams
Software—Practice & Experience
Estimating simple functions on the union of data streams
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Data Structures and Algorithm Analysis in Java
Data Structures and Algorithm Analysis in Java
Using Visualization to Detect Plagiarism in Computer Science Classes
INFOVIS '00 Proceedings of the IEEE Symposium on Information Vizualization 2000
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Continuous space language models
Computer Speech and Language
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Why simple hash functions work: exploiting the entropy in a data stream
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
The universality of iterated hashing over variable-length strings
Discrete Applied Mathematics
Hi-index | 0.00 |
Many applications use sequences of n consecutive symbols (n-grams). Hashing these n-grams can be a performance bottleneck. For more speed, recursive hash families compute hash values by updating previous values. We prove that recursive hash families cannot be more than pairwise independent. While hashing by irreducible polynomials is pairwise independent, our implementations either run in time O(n) or use an exponential amount of memory. As a more scalable alternative, we make hashing by cyclic polynomials pairwise independent by ignoring n-1 bits. Experimentally, we show that hashing by cyclic polynomials is twice as fast as hashing by irreducible polynomials. We also show that randomized Karp-Rabin hash families are not pairwise independent.