The string B-tree: a new data structure for string search in external memory and its applications
Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
On the sorting-complexity of suffix tree construction
Journal of the ACM (JACM)
A Database Index to Large Biological Sequences
Proceedings of the 27th International Conference on Very Large Data Bases
COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Suffix Trees (and Relatives) Come of Age in Bioinformatics
CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Overcoming the Memory Bottleneck in Suffix Tree Construction
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
A parallel algorithm for the extraction of structured motifs
Proceedings of the 2004 ACM symposium on Applied computing
Constructing chromosome scale suffix trees
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Engineering a Fast Online Persistent Suffix Tree Construction
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Constructing Suffix Tree for Gigabyte Sequences with Megabyte Memory
IEEE Transactions on Knowledge and Data Engineering
Linear time algorithms for finding and representing all the tandem repeats in a string
Journal of Computer and System Sciences
Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
Practical suffix tree construction
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Distributed and paged suffix trees for large genetic databases
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Search-Optimized suffix-tree storage for biological applications
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
PSIST: A scalable approach to indexing protein structures using suffix trees
Journal of Parallel and Distributed Computing
Improving suffix array locality for fast pattern matching on disk
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A new method for indexing genomes using on-disk suffix trees
Proceedings of the 17th ACM conference on Information and knowledge management
Reducing Space Requirements for Disk Resident Suffix Arrays
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
OLAP on search logs: an infrastructure supporting data-driven applications in search engines
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Serial and parallel methods for i/o efficient suffix tree construction
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
AS-index: a structure for string search using n-grams and algebraic signatures
Proceedings of the 18th ACM conference on Information and knowledge management
Suffix trees for very large genomic sequences
Proceedings of the 18th ACM conference on Information and knowledge management
Indexing genomic sequences on the IBM Blue Gene
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Anchoring millions of distinct reads on the human genome within seconds
Proceedings of the 13th International Conference on Extending Database Technology
Suffix tree construction algorithms on modern hardware
Proceedings of the 13th International Conference on Extending Database Technology
I/O efficient algorithms for serial and parallel suffix tree construction
ACM Transactions on Database Systems (TODS)
Frequent tree pattern mining: A survey
Intelligent Data Analysis
Suffix trees for inputs larger than main memory
Information Systems
ERA: efficient serial and parallel suffix tree construction for very long strings
Proceedings of the VLDB Endowment
On-line suffix tree construction with reduced branching
Journal of Discrete Algorithms
Efficient parallel construction of suffix trees for genomes larger than main memory
Proceedings of the 20th European MPI Users' Group Meeting
RACE: a scalable and elastic parallel system for discovering repeats in very long sequences
Proceedings of the VLDB Endowment
Efficient techniques on retrieving bio-information for active U-healthcare
Personal and Ubiquitous Computing
Hi-index | 0.00 |
With the exponential growth of biological sequence databases, it has become critical to develop effective techniques for storing, querying, and analyzing these massive data. Suffix trees are widely used to solve many sequence-based problems, and they can be built in linear time and space, provided the resulting tree fits in main-memory. To index larger sequences, several external suffix tree algorithms have been proposed in recent years. However, they suffer from several problems such as susceptibility to data skew, non-scalability to genome-scale sequences, and non-existence of suffix links, which are crucial in various suffix tree based algorithms. In this paper, we target DNA sequences and propose a novel disk-based suffix tree algorithm called TRELLIS, which effectively scales up to genome-scale sequences. Specifically, it can index the entire human genome using 2GB of memory, in about 4 hours and can recover all its suffix links within 2 hours. TRELLIS was compared to various state-of-the-art persistent disk-based suffix tree construction algorithms, and was shown to outperform the best previous methods, both in terms of indexing time and querying time.