Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Efficient implementation of suffix trees
Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
On effective multi-dimensional indexing for strings
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Reducing the space requirement of suffix trees
Software—Practice & Experience
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
Exact match search in sequence data using suffix trees
Proceedings of the 14th ACM international conference on Information and knowledge management
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Efficient techniques on retrieving bio-information for active U-healthcare
Personal and Ubiquitous Computing
Hi-index | 0.00 |
We investigate indexing techniques for sequence data, crucial in a wide variety of applications, where efficient, scalable, and versatile search algorithms are required. Recent research has focused on suffix trees (ST) and suffix arrays (SA) as desirable index representations. Existing solutions for very long sequences however provide either efficient index construction or efficient search, but not both. We propose a new ST representation, STTD64, which has reasonable construction time and storage requirement, and is efficient in search. We have implemented the construction and search algorithms for the proposed technique and conducted numerous experiments to evaluate its performance on various types of real sequence data. Our results show that while the construction time for STTD64 is comparable with current ST based techniques, it outperforms them in search. Compared to ESA, the best known SA technique, STTD64 exhibits slower construction time, but has similar space requirement and comparable search time. Unlike ESA, which is memory based, STTD64 is scalable and can handle very long sequences.