Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Reducing the space requirement of suffix trees
Software—Practice & Experience
Efficient Index Structures for String Databases
Proceedings of the 27th International Conference on Very Large Data Bases
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
Efficient Implementation of Lazy Suffix Trees
WAE '99 Proceedings of the 3rd International Workshop on Algorithm Engineering
On-Line Construction of Compact Directed Acyclic Word Graphs
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Direct Construction of Compact Directed Acyclic Word Graphs
CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
BODHI: a database habitat for bio-diversity information
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Hi-index | 0.00 |
The indexing technique commonly used for long strings,such as genomes, is the suffix tree, which is based on a vertical(intra-path) compaction of the underlying trie structure.In this paper, we investigate an alternative approach to indexbuilding, based on horizontal (inter-path) compactionof the trie. In particular, we present SPINE, a carefully engineeredhorizontally-compacted trie index. SPINE consistsof a backbone formed by a linear chain of nodes representingthe underlying string, with the nodes connected by arich set of edges for facilitating fast forward and backwardtraversals over the backbone during index construction andquery search. A special feature of SPINE is that it collapsesthe trie into a linear structure, representing the logical extremeof horizontal compaction.We describe algorithms for SPINE construction and forsearching this index to find the occurrences of query patterns.Our experimental results on a variety of real genomicand proteomic strings show that SPINE requires significantlyless space than standard implementations of suffixtrees. Further, SPINE takes lesser time for both constructionand search as compared to suffix trees, especially whenthe index is disk-resident. Finally, the linearity of its structuremakes it more amenable for integration with databaseengines.