SPINE: Putting Backbone into String Indexing

  • Authors:
  • Naresh Neelapala;Romil Mittal;Jayant R. Haritsa

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The indexing technique commonly used for long strings,such as genomes, is the suffix tree, which is based on a vertical(intra-path) compaction of the underlying trie structure.In this paper, we investigate an alternative approach to indexbuilding, based on horizontal (inter-path) compactionof the trie. In particular, we present SPINE, a carefully engineeredhorizontally-compacted trie index. SPINE consistsof a backbone formed by a linear chain of nodes representingthe underlying string, with the nodes connected by arich set of edges for facilitating fast forward and backwardtraversals over the backbone during index construction andquery search. A special feature of SPINE is that it collapsesthe trie into a linear structure, representing the logical extremeof horizontal compaction.We describe algorithms for SPINE construction and forsearching this index to find the occurrences of query patterns.Our experimental results on a variety of real genomicand proteomic strings show that SPINE requires significantlyless space than standard implementations of suffixtrees. Further, SPINE takes lesser time for both constructionand search as compared to suffix trees, especially whenthe index is disk-resident. Finally, the linearity of its structuremakes it more amenable for integration with databaseengines.