The Boyer Moore Galil string searching strategies revisited
SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Efficient implementation of suffix trees
Software—Practice & Experience
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
A fast string searching algorithm
Communications of the ACM
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
The suffix binary search tree and suffix AVL tree
Journal of Discrete Algorithms
OASIS: an online and accurate technique for local-alignment searches on biological sequences
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient and scalable indexing techniques for biological sequence data
BIRD'07 Proceedings of the 1st international conference on Bioinformatics research and development
Hi-index | 0.00 |
We study suitable indexing techniques to support efficient exact match search in large biological sequence databases. We propose a suffix tree (ST) representation, called STA-DF, as an alternative to the array representation of ST (STA) proposed in [7] and utilized in [18]. To study the performance of STA and STA-DF, we develop a memory efficient ST-based Exact Match (STEM) search algorithm. We implemented STEM and both representations of ST and conducted extensive experiments. Our results indicate that the STA and STA-DF representations are very similar in construction time, storage utilization, and search time using STEM. In terms of the access patterns by STEM, our results show that compared to STA, the STA-DF representation exhibits better spatial and sequential locality of reference. This suggests that STA-DF would require less number of disk I/Os, and hence is more amenable to efficient and scalable disk-based computation.