Trends in suffix sorting: a survey of low memory algorithms

Authors:
Jasbir Dhaliwal;Simon J. Puglisi;Andrew Turpin
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia;University of Melbourne, Melbourne, Australia
Venue:
ACSC '12 Proceedings of the Thirty-fifth Australasian Computer Science Conference - Volume 122
Year:
2012

Citing 24
Cited 1

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Fast algorithms for sorting and searching strings

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Engineering a Lightweight Suffix Array Construction Algorithm

Algorithmica
Indexing compressed text

Journal of the ACM (JACM)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
Optimal suffix selection

Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Fast BWT in small space by blockwise suffix sorting

Theoretical Computer Science
Linear Suffix Array Construction by Almost Pure Induced-Sorting

DCC '09 Proceedings of the 2009 Data Compression Conference
Compressed Suffix Arrays for Massive Data

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
On compressing the textual web

Proceedings of the third ACM international conference on Web search and data mining
Fast lightweight suffix array construction and checking

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Top-k ranked document search in general text databases

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Finding the maximum suffix with fewer comparisons

Journal of Discrete Algorithms
Inverted indexes for phrases and strings

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Lightweight BWT construction for very large string collections

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Lightweight data indexing and compression in external memory

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Linear-Time construction of compressed suffix arrays using o(n log n)-bit working space for large alphabets

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
In-place suffix sorting

ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming

Suffix Array Construction in External Memory Using D-Critical Substrings

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The suffix array is a sorted array of all the suffixes in a string. This remarkably simple data structure is fundamental for string processing and lies at the heart of efficient algorithms for pattern matching, pattern mining, and data compression. In many applications suffix array construction, or equivalently suffix sorting, is a computational bottleneck and so has been the focus of intense research in the last 20 years. This paper outlines several suffix array construction algorithms that have emerged since the survey due to Puglisi, Smyth and Turpin [ACM Computing Surveys 39, 2007]. These algorithms have tended to strive for small working space (RAM), often at the cost of runtime, and make use of compressed data structures or secondary memory (disk) to achieve this goal. We provide a high-level description of each algorithm, avoiding implementation details as much as possible, and outline directions that could benefit from further research.