Effective text compression with simultaneous digram and trigram encoding
Journal of Information Science
Software—Practice & Experience
An analysis of the Karp-Rabin string matching algorithm
Information Processing Letters
Introduction to algorithms
Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
Handbook of algorithms and data structures: in Pascal and C (2nd ed.)
An approximate string-matching algorithm
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Highlights: language- and domain-independent automatic indexing terms for abstracting
Journal of the American Society for Information Science
A dynamic hypertext environment through n-gram analysis
A dynamic hypertext environment through n-gram analysis
One-time complete indexing of text: theory and practice
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Communications of the ACM
Implementation of the substring test by hashing
Communications of the ACM
An information-theoretic approach to text searching in direct access systems
Communications of the ACM
The use of context for correcting garbled English text
ACM '64 Proceedings of the 1964 19th ACM national conference
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Using Visualization to Detect Plagiarism in Computer Science Classes
INFOVIS '00 Proceedings of the IEEE Symposium on Information Vizualization 2000
Algebraic Signatures for Scalable Distributed Data Structures
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Comparing inverted files and signature files for searching a large lexicon
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
SPIN '08 Proceedings of the 15th international workshop on Model Checking Software
SNIF TOOL: sniffing for patterns in continuous streams
Proceedings of the 17th ACM conference on Information and knowledge management
Dynamic Incremental Hashing in Program Model Checking
Electronic Notes in Theoretical Computer Science (ENTCS)
Recursive n-gram hashing is pairwise independent, at best
Computer Speech and Language
SplitScreen: enabling efficient, distributed malware detection
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Designing a cross-language comparison-shopping agent
Decision Support Systems
Efficient inference in large discrete domains
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
The universality of iterated hashing over variable-length strings
Discrete Applied Mathematics
A compact representation of nondeterministic (suffix) automata for the bit-parallel approach
Information and Computation
Exact pattern matching with feed-forward bloom filters
Journal of Experimental Algorithmics (JEA)
Space savings and design considerations in variable length deduplication
ACM SIGOPS Operating Systems Review
Hi-index | 0.00 |
Many indexing, retrieval, and comparison methods are based on counting or cataloguing n-grams in streams of symbols. The fastest method of implementing such operations is through the use of hash tables. Rapid hashing of consecutive n-grams is best done using a recursive hash function, in which the hash value of the current n-gram is drived from the hash value of its predecessor. This article generalizes recursive hash functions found in the literature and proposes new methods offering superior performance. Experimental results demonstrate substantial speed improvement over conventional approaches, while retaining near-ideal hash value distribution.