Succinct suffix arrays based on run-length encoding

Authors:
Veli Mäkinen;Gonzalo Navarro
Affiliations:
AG Genominformatik, Technische Fakultät Universität Bielefeld, Germany;Center for Web Research Dept. of Computer Science, University of Chile
Venue:
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Year:
2005

Citing 17
Cited 7

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Compact pat trees

Compact pat trees
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
An experimental study of an opportunistic index

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Succinct representations of lcp information and improvements in the compressed suffix arrays

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Low Redundancy in Static Dictionaries with O(1) Worst Case Lookup Time

ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Compact suffix array: a space-efficient full-text index

Fundamenta Informaticae - Special issue on computing patterns in strings
When indexing equals compression: experiments with compressing suffix arrays and applications

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Indexing text using the Ziv-Lempel trie

Journal of Discrete Algorithms - SPIRE 2002
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)

Succinct suffix arrays based on run-length encoding

Nordic Journal of Computing
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Dynamic rank/select structures with applications to run-length encoded texts

Theoretical Computer Science
Efficient construction of FM-index using overlapping block processing for large scale texts

ECIR'07 Proceedings of the 29th European conference on IR research
Counting colours in compressed strings

CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Space-efficient construction of LZ-index

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Dynamic rank-select structures with applications to run-length encoded texts

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

A succinct full-text self-index is a data structure built on a text T=t1t2... tn, which takes little space (ideally close to that of the compressed text), permits efficient search for the occurrences of a pattern P=p1p2... pm in T, and is able to reproduce any text substring, so the self-index replaces the text. Several remarkable self-indexes have been developed in recent years. They usually take O(nH0) or O(nHk) bits, being Hk the kth order empirical entropy of T. The time to count how many times does P occur in T ranges from O(m) to O(mlog n). We present a new self-index, called run-length FM-index (RLFM index), that counts the occurrences of P in T in O(m) time when the alphabet size is $\sigma=O(\textrm{polylog}(n))$. The index requires nHklog2σ+O(n) bits of space for small k. We then show how to implement the RLFM index in practice, and obtain in passing another implementation with different space-time tradeoffs. We empirically compare ours against the best existing implementations of other indexes and show that ours are fastest among indexes taking less space than the text.