An Efficient Method for in Memory Construction of Suffix Arrays

Authors:
Hideo Itoh;Hozumi Tanaka
Affiliations:
-;-
Venue:
SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Year:
1999

Citing 0
Cited 16

Engineering a Lightweight Suffix Array Construction Algorithm

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Searching large text collections

Handbook of massive data sets
Compressed full-text indexes

ACM Computing Surveys (CSUR)
A taxonomy of suffix array construction algorithms

ACM Computing Surveys (CSUR)
An efficient, versatile approach to suffix sorting

Journal of Experimental Algorithmics (JEA)
Fast BWT in small space by blockwise suffix sorting

Theoretical Computer Science
Faster suffix sorting

Theoretical Computer Science
Fast lightweight suffix array construction and checking

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Post BWT stages of the Burrows–Wheeler compression algorithm

Software—Practice & Experience
Inducing the LCP-array

WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Revisiting bounded context block-sorting transformations

Software—Practice & Experience
Optimal lightweight construction of suffix arrays for constant alphabets

WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures
Parallel suffix array and least common prefix for the GPU

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Practical linear-time O(1)-workspace suffix sorting for constant alphabets

ACM Transactions on Information Systems (TOIS)
Faster semi-external suffix sorting

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

The suffix array is a string-indexing structure and a memory efficient alternative to the suffix tree. It has many advantages for text processing. Here we propose an efficient algorithm for sorting suffixes. We call this algorithm the two-stage suffix sort. One of our ideas is to exploit the specific relationships between adjacent suffixes. Our algorithm makes it possible to use the suffix array for much larger texts and suggests new areas of application. Our experiments on several text data sets (including 514-MB Japanese newspapers) demonstrate that our algorithm is 4.5 to 6.9 times faster than Quicksort, and 2.5 to 3.6 times faster than Sadakane's algorithm, which is considered to be the fastest algorithm in previous works.