Scalable parallel suffix array construction

Authors:
Fabian Kulla;Peter Sanders
Affiliations:
Forschungszentrum Karlsruhe, 76344 Eggenstein-Leopoldshafen, Germany;Universität Karlsruhe, 76128 Karlsruhe, Germany
Venue:
Parallel Computing
Year:
2007

Citing 10
Cited 4

New indices for text: PAT Trees and PAT arrays

Information retrieval
Parallel sorting by regular sampling

Journal of Parallel and Distributed Computing
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
MPI: The Complete Reference

MPI: The Complete Reference
The Enhanced Suffix Array and Its Applications to Genome Analysis

WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
Engineering a Lightweight Suffix Array Construction Algorithm

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
The Performance of Linear Time Suffix Sorting Algorithms

DCC '05 Proceedings of the Data Compression Conference
Linear work suffix array construction

Journal of the ACM (JACM)
Linear-time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching

Compressed Suffix Arrays for Massive Data

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
FEMTO: fast search of large sequence collections

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Parallel suffix array construction for shared memory architectures

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Parallel suffix array and least common prefix for the GPU

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. We describe the first implementation and experimental evaluation of a scalable parallel algorithm for suffix array construction. The implementation works on distributed memory computers using MPI, Experiments with up to 512 processors show good constant factors and make it look likely that the algorithm could also be adapted to even larger systems. This makes it possible to build suffix arrays for huge inputs very quickly. Our algorithm is a parallelization of the linear time DC3 algorithm.