Practical methods for constructing suffix trees
The VLDB Journal — The International Journal on Very Large Data Bases
Fast Frequent String Mining Using Suffix Arrays
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Software—Practice & Experience
ACM Computing Surveys (CSUR)
Linear work suffix array construction
Journal of the ACM (JACM)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
The engineering of a compression boosting library: theory vs practice in BWT compression
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
An efficient, versatile approach to suffix sorting
Journal of Experimental Algorithmics (JEA)
A simpler analysis of Burrows–Wheeler-based compression
Theoretical Computer Science
Fast BWT in small space by blockwise suffix sorting
Theoretical Computer Science
Theoretical Computer Science
International Journal of Bioinformatics Research and Applications
Improving suffix array locality for fast pattern matching on disk
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Polymorphic worm detection using token-pair signatures
Proceedings of the 4th international workshop on Security, privacy and trust in pervasive and ubiquitous computing
Linear-Time Computation of Similarity Measures for Sequential Data
The Journal of Machine Learning Research
A space efficient solution to the frequent string mining problem for many databases
Data Mining and Knowledge Discovery
Fast and Adaptive Variable Order Markov Chain Construction
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
A new method for indexing genomes using on-disk suffix trees
Proceedings of the 17th ACM conference on Information and knowledge management
Linear Time Suffix Array Construction Using D-Critical Substrings
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
The longest common extension problem revisited and applications to approximate string searching
Journal of Discrete Algorithms
Computing the inverse sort transform in linear time
ACM Transactions on Algorithms (TALG)
On the possible patterns of inputs for block sorting in the Burrows-Wheeler transformation
Information Processing Letters
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
Optimal string mining under frequency constraints
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
A simpler analysis of burrows-wheeler based compression
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Graph based signature classes for detecting polymorphic worms via content analysis
Computer Networks: The International Journal of Computer and Telecommunications Networking
On the number of elements to reorder when updating a suffix array
Journal of Discrete Algorithms
On-line suffix tree construction with reduced branching
Journal of Discrete Algorithms
p-Suffix sorting as arithmetic coding
IWOCA'11 Proceedings of the 22nd international conference on Combinatorial Algorithms
Memory-Aware BWT by segmenting sequences to support subsequence search
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
p-Suffix sorting as arithmetic coding
Journal of Discrete Algorithms
A comparison of index-based lempel-Ziv LZ77 factorization algorithms
ACM Computing Surveys (CSUR)
Most burrows-wheeler based compressors are not optimal
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Fast and practical algorithms for computing all the runs in a string
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Optimal lightweight construction of suffix arrays for constant alphabets
WADS'07 Proceedings of the 10th international conference on Algorithms and Data Structures
Trends in suffix sorting: a survey of low memory algorithms
ACSC '12 Proceedings of the Thirty-fifth Australasian Computer Science Conference - Volume 122
Practical linear-time O(1)-workspace suffix sorting for constant alphabets
ACM Transactions on Information Systems (TOIS)
Autonomous, failure-resilient orchestration of distributed discrete event simulations
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Hi-index | 0.00 |
In this paper we describe a new algorithm for building the suffix array of a string. This task is equivalent to the problem of lexicographically sorting all the suffixes of the input string. Our algorithm is based on a new approach called deep–shallow sorting: we use a “shallow” sorter for the suffixes with a short common prefix, and a “deep” sorter for the suffixes with a long common prefix. All the known algorithms for building the suffix array either require a large amount of space or are inefficient when the input string contains many repeated substrings. Our algorithm has been designed to overcome this dichotomy. Our algorithm is “lightweight” in the sense that it uses very small space in addition to the space required by the suffix array itself. At the same time our algorithm is fast even when the input contains many repetitions: this has been shown by extensive experiments with inputs of size up to 110 Mb. The source code of our algorithm, as well as a C library providing a simple API, is available under the GNU GPL.