Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
Reducing the space requirement of suffix trees
Software—Practice & Experience
Database System Implementation
Database System Implementation
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Engineering a Fast Online Persistent Suffix Tree Construction
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory
IEEE Transactions on Knowledge and Data Engineering
Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Genome-scale disk-based suffix tree indexing
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Improving suffix array locality for fast pattern matching on disk
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Distributed and paged suffix trees for large genetic databases
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Simple linear work suffix array construction
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Suffix trees for very large genomic sequences
Proceedings of the 18th ACM conference on Information and knowledge management
Suffix trees for inputs larger than main memory
Information Systems
Hi-index | 0.00 |
We propose a new method to build persistent suffix trees for indexing the genomic data. Our algorithm DiGeST (Disk-Based Genomic Suffix Tree) improves significantly over previous work in reducing the random access to the input string and performing only two passes over disk data. DiGeST is based on the two-phase multi-way merge sort paradigm using a concise binary representation of the DNA alphabet. Furthermore, our method scales to larger genomic data than managed before.