Recursive hashing functions for n-grams

  • Authors:
  • Jonathan D. Cohen

  • Affiliations:
  • National Security Agency, Fort Meade, MD

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many indexing, retrieval, and comparison methods are based on counting or cataloguing n-grams in streams of symbols. The fastest method of implementing such operations is through the use of hash tables. Rapid hashing of consecutive n-grams is best done using a recursive hash function, in which the hash value of the current n-gram is drived from the hash value of its predecessor. This article generalizes recursive hash functions found in the literature and proposes new methods offering superior performance. Experimental results demonstrate substantial speed improvement over conventional approaches, while retaining near-ideal hash value distribution.