Parallel suffix array construction for shared memory architectures

Authors:
Vitaly Osipov
Affiliations:
Karlsruhe Institute of Technology, Germany
Venue:
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Year:
2012

Citing 12
Cited 0

Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Linear work suffix array construction

Journal of the ACM (JACM)
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
Scalable parallel suffix array construction

Parallel Computing
Faster suffix sorting

Theoretical Computer Science
Better external memory suffix array construction

Journal of Experimental Algorithmics (JEA)
Linear Suffix Array Construction by Almost Pure Induced-Sorting

DCC '09 Proceedings of the 2009 Data Compression Conference
Simulated Annealing with Iterative Improvement

ICSPS '09 Proceedings of the 2009 International Conference on Signal Processing Systems
Designing efficient sorting algorithms for manycore GPUs

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Parallel Lexicographic Names Construction with CUDA

ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Space efficient linear time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the design of the algorithm for constructing the suffix array of a string using manycore GPUs. Despite of the wide usage in text processing and extensive research over two decades there was a lack of efficient algorithms that were able to exploit shared memory parallelism (as multicore CPUs as manycore GPUs) in practice. To the best of our knowledge we developed the first approach exposing shared memory parallelism that significantly outperforms the state-of-the-art existing implementations for sufficiently large inputs. We reduced the suffix array construction problem to a number of parallel primitives such as prefix-sum, radix sorting, random gather and scatter from/to the memory. Thus, the performance of the algorithm merely depends on the performance of these primitives on the particular shared memory architecture. We demonstrate its performance on manycore GPUs, but the method can also be applied for other parallel architectures, such as multicores, CELL or Intel MIC.