AS-index: a structure for string search using n-grams and algebraic signatures

  • Authors:
  • Cédric du Mouza;Witold Litwin;Philippe Rigaux;Thomas Schwarz

  • Affiliations:
  • CNAM, Paris, France;CERIA, Paris, France;INRIA-SACLAY, Orsay, France;Santa Clara Univ., Santa Clara, USA

  • Venue:
  • Proceedings of the 18th ACM conference on Information and knowledge management
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

AS-Index is a new index structure for exact string search in disk resident databases. It uses hashing, unlike known alternatives, whether baesd on trees or tries. It typically indexes every n-gram in the database, though non-dense indexing is also possible. The hash function uses the algebraic signatures of all n-grams. Use of hashing provides for constant index access time for arbitrarily long patterns, unlike other structures whose search cost is at best logarithmic. The storage overhead of AS-Index is basically 500 - 600%, similar to that of alternatives or smaller. We show the index structure, our use of algebraic signatures and the search algorithm. We discuss the design trade-offs and present the theoretical and experimental performance analysis. We compare the AS-Index to main alternatives. We conclude that AS-Index is an attractive structure and we indicate directions for future work.