Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Database indexing for large DNA and protein sequence collections
The VLDB Journal — The International Journal on Very Large Data Bases
Handbook of Exact String Matching Algorithms
Handbook of Exact String Matching Algorithms
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
Dynamic text and static pattern matching
ACM Transactions on Algorithms (TALG)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
A new suffix tree similarity measure for document clustering
Proceedings of the 16th international conference on World Wide Web
Genome-scale disk-based suffix tree indexing
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
iSAX: disk-aware mining and indexing of massive time series datasets
Data Mining and Knowledge Discovery
Serial and parallel methods for i/o efficient suffix tree construction
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Suffix trees for very large genomic sequences
Proceedings of the 18th ACM conference on Information and knowledge management
Indexing genomic sequences on the IBM Blue Gene
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
High Throughput Short Read Alignment via Bi-directional BWT
BIBM '09 Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine
Efficient Periodicity Mining in Time Series Databases Using Suffix Trees
IEEE Transactions on Knowledge and Data Engineering
Efficient parallel construction of suffix trees for genomes larger than main memory
Proceedings of the 20th European MPI Users' Group Meeting
String analysis by sliding positioning strategy
Journal of Computer and System Sciences
RACE: a scalable and elastic parallel system for discovering repeats in very long sequences
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The suffix tree is a data structure for indexing strings. It is used in a variety of applications such as bioinformatics, time series analysis, clustering, text editing and data compression. However, when the string and the resulting suffix tree are too large to fit into the main memory, most existing construction algorithms become very inefficient. This paper presents a disk-based suffix tree construction method, called Elastic Range (ERa), which works efficiently with very long strings that are much larger than the available memory. ERa partitions the tree construction process horizontally and vertically and minimizes I/Os by dynamically adjusting the horizontal partitions independently for each vertical partition, based on the evolving shape of the tree and the available memory. Where appropriate, ERa also groups vertical partitions together to amortize the I/O cost. We developed a serial version; a parallel version for shared-memory and shared-disk multi-core systems; and a parallel version for shared-nothing architectures. ERa indexes the entire human genome in 19 minutes on an ordinary desktop computer. For comparison, the fastest existing method needs 15 minutes using 1024 CPUs on an IBM BlueGene supercomputer.