Constructing chromosome scale suffix trees

Authors:
A. L. Brown
Affiliations:
University of Adelaide, South Australia, Australia
Venue:
APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Year:
2004

Citing 5
Cited 6

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Reducing the space requirement of suffix trees

Software—Practice & Experience
Suffix vector: space- and time-efficient alternative to suffix trees

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Database indexing for large DNA and protein sequence collections

The VLDB Journal — The International Journal on Very Large Data Bases

Constructing large suffix trees on a computational grid

Journal of Parallel and Distributed Computing
Genome-scale disk-based suffix tree indexing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Serial and parallel methods for i/o efficient suffix tree construction

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Indexing genomic sequences on the IBM Blue Gene

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
I/O efficient algorithms for serial and parallel suffix tree construction

ACM Transactions on Database Systems (TODS)
Parallel construction of large suffix trees on a PC cluster

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Suffix trees have been the focus of significant research interest as they permit very efficient solutions to a range of string and sequence searching problems. Given a suffix tree that encodes a particular string, it is possible to solve problems such as searching for a specific pattern in time proportional to the length of the pattern rather than the length of the string. Suffix trees can also support inexact matching by dramatically improving the performance of dynamic programming. Therefore, suffix trees may enable a number of large scale bioinformatics problems to be solved more efficiently than previously thought. However, these benefits presume that a suffix tree of sufficient scale can be constructed. An inherent difficulty in suffix tree construction is that the tree construction requires a semi random walk over the tree as it is constructed. Therefore very large trees that will not fit in memory could take an unacceptably long time to construct due to excessive page faulting. In this paper we present a linear time construction algorithm that can construct suffix trees larger than memory using discrete sub-trees. The sub-trees can be constructed in paralel. The perforance of the algorithm is evaluated using suffix trees constructed for chromosomes 1 and 12 of the human genome.