Distributed and paged suffix trees for large genetic databases

Authors:
Raphaël Clifford;Marek Sergot
Affiliations:
Department of Computing, Imperial College, London;Department of Computing, Imperial College, London
Venue:
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Year:
2003

Citing 10
Cited 11

An efficient algorithm for the All Pairs Suffix-Prefix Problem

Information Processing Letters
Improved behaviour of tries by adaptive branching

Information Processing Letters
A fully-dynamic data structure for external substring search

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The string B-tree: a new data structure for string search in external memory and its applications

Journal of the ACM (JACM)
Fast string searching in secondary storage: theoretical developments and experimental results

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Accelerating Protein Classification Using Suffix Trees

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Sparse Suffix Trees

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Suffix Trees on Words

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching

Distributed suffix trees and their application to large-scale genomic analysis

ICCMSE '03 Proceedings of the international conference on Computational methods in sciences and engineering
Practical methods for constructing suffix trees

The VLDB Journal — The International Journal on Very Large Data Bases
Linear work suffix array construction

Journal of the ACM (JACM)
Constructing large suffix trees on a computational grid

Journal of Parallel and Distributed Computing
Genome-scale disk-based suffix tree indexing

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
On-line construction of compact suffix vectors and maximal repeats

Theoretical Computer Science
A new method for indexing genomes using on-disk suffix trees

Proceedings of the 17th ACM conference on Information and knowledge management
Serial and parallel methods for i/o efficient suffix tree construction

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Fast lightweight suffix array construction and checking

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
I/O efficient algorithms for serial and parallel suffix tree construction

ACM Transactions on Database Systems (TODS)
On-Line linear-time construction of word suffix trees

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present two new variants of the suffix tree which allow much larger genome sequence databases to be handled efficiently. The method is based on a new linear time construction algorithm for "sparse" suffix trees, which are subtrees of the whole suffix tree. The new data structures are called the paged suffix tree (PST) and the distributed suffix tree (DST). Both tackle the memory bottleneck by constructing subtrees of the full suffix tree independently and are designed for single processor and distributed memory parallel computing environments (e.g. Beowulf clusters), respectively. The standard operations on suffix trees of biological importance are shown to be easily translatable to these new data structures. While none of these operations on the DST require interprocess communication, many have optimal expected parallel running times.