Statistical encoding of succinct data structures

Authors:
Rodrigo González;Gonzalo Navarro
Affiliations:
Department of Computer Science, University of Chile;Department of Computer Science, University of Chile
Venue:
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Year:
2006

Citing 14
Cited 15

Text compression

Text compression
Compression of Low Entropy Strings with Lempel--Ziv Algorithms

SIAM Journal on Computing
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Succinct representation of balanced parentheses, static trees and planar graphs

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Indexing text using the Ziv-Lempel trie

Journal of Discrete Algorithms - SPIRE 2002
Structuring labeled trees for optimal succinctness, and beyond

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Squeezing succinct data structures into entropy bounds

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Compressing and searching XML data via two zips

Proceedings of the 15th international conference on World Wide Web
Succinct suffix arrays based on run-length encoding

Nordic Journal of Computing
Succinct representations of permutations

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming

Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Dynamic entropy-compressed sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
Simple Random Access Compression

Fundamenta Informaticae
Wee LCP

Information Processing Letters
A web search engine model based on index-query bit-level compression

Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications
Optimal trade-offs for succinct string indexes

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Spatio-temporal range searching over compressed kinetic sensor data

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Succinct indexes for strings, binary relations and multilabeled trees

ACM Transactions on Algorithms (TALG)
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays

SIAM Journal on Computing
CRAM: compressed random access memory

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Simple Random Access Compression

Fundamenta Informaticae
Compressed text indexes with fast locate

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Improved address-calculation coding of integer arrays

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Development of a Novel Compressed Index-Query Web Search Engine Model

International Journal of Information Technology and Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent work, Sadakane and Grossi [SODA 2006] introduced a scheme to represent any sequence S=s1s2...sn, over an alphabet of size σ, using $nH_k(S)+O(\frac{n}{\log_\sigma n} (k \log \sigma + \log\log n))$ bits of space, where Hk(S) is the k-th order empirical entropy of S. The representation permits extracting any substring of size Θ(logσn) in constant time, and thus it completely replaces S under the RAM model. This is extremely important because it permits converting any succinct data structure requiring o(|S|) = o(nlogσ) bits in addition to S, into another requiring nHk(S)+o(nlogσ) (overall) for any k = o(logσn). They achieve this result by using Ziv-Lempel compression, and conjecture that the result can in particular be useful to implement compressed full-text indexes. In this paper we extend their result, by obtaining the same space and time complexities using a simpler scheme based on statistical encoding. We show that the scheme supports appending symbols in constant amortized time. In addition, we prove some results on the applicability of the scheme for full-text self-indexing.