The input/output complexity of sorting and related problems
Communications of the ACM
New indices for text: PAT Trees and PAT arrays
Information retrieval
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
On sorting strings in external memory (extended abstract)
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
On the sorting-complexity of suffix tree construction
Journal of the ACM (JACM)
Database System Concepts
The Enhanced Suffix Array and Its Applications to Genome Analysis
WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays
COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
On Constructing Suffix Arrays in External Memory
ESA '99 Proceedings of the 7th Annual European Symposium on Algorithms
Implementing I/O-efficient Data Structures Using TPIE
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Engineering a Lightweight Suffix Array Construction Algorithm
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Asynchronous parallel disk sorting
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Linear work suffix array construction
Journal of the ACM (JACM)
Fast lightweight suffix array construction and checking
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Scalable parallel suffix array construction
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
STXXL: standard template library for XXL data sets
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Linear work suffix array construction
Journal of the ACM (JACM)
Improving suffix array locality for fast pattern matching on disk
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Reducing Space Requirements for Disk Resident Suffix Arrays
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Permuted Longest-Common-Prefix Array
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Transformation of Suffix Arrays into Suffix Trees on the MPI Environment
RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Compressed Suffix Arrays for Massive Data
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Space-economical partial gram indices for exact substring matching
Proceedings of the 18th ACM conference on Information and knowledge management
AS-index: a structure for string search using n-grams and algebraic signatures
Proceedings of the 18th ACM conference on Information and knowledge management
Algorithm engineering: bridging the gap between algorithm theory and practice
Algorithm engineering: bridging the gap between algorithm theory and practice
Toward optimal disk layout of genome scale suffix trees
SEAL'10 Proceedings of the 8th international conference on Simulated evolution and learning
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
Efficient Maximal Repeat Finding Using the Burrows-Wheeler Transform and Wavelet Tree
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Lightweight data indexing and compression in external memory
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Memory-Aware BWT by segmenting sequences to support subsequence search
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Parallel suffix array construction for shared memory architectures
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Scalable string similarity search/join with approximate seeds and multiple backtracking
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Practical linear-time O(1)-workspace suffix sorting for constant alphabets
ACM Transactions on Information Systems (TOIS)
Suffix Array Construction in External Memory Using D-Critical Substrings
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications, in particular, in bioinformatics. However, so far, it has appeared prohibitive to build suffix arrays for huge inputs that do not fit into main memory. This paper presents design, analysis, implementation, and experimental evaluation of several new and improved algorithms for suffix array construction. The algorithms are asymptotically optimal in the worst case or on average. Our implementation can construct suffix arrays for inputs of up to 4-GB in hours on a low-cost machine. As a tool of possible independent interest, we present a systematic way to design, analyze, and implement pipelined algorithms.