A very fast substring search algorithm
Communications of the ACM
A new approach to text searching
Communications of the ACM
Two-dimensional periodicity and its applications
SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
Arithmetic coding for data compression
Communications of the ACM
A text compression scheme that allows fast searching directly in the compressed file
ACM Transactions on Information Systems (TOIS)
Compact pat trees
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Fast and flexible string matching by combining bit-parallelism and suffix automata
Journal of Experimental Algorithmics (JEA)
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Succinct static data structures
Succinct static data structures
Shift-or string matching with super-alphabets
Information Processing Letters
Squeezing succinct data structures into entropy bounds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Rank and select revisited and extended
Theoretical Computer Science
An efficient compression code for text databases
ECIR'03 Proceedings of the 25th European conference on IR research
Succinct backward-DAWG-matching
Journal of Experimental Algorithmics (JEA)
Simple Random Access Compression
Fundamenta Informaticae
Storing the web in memory: space efficient language models with constant time retrieval
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Simple Random Access Compression
Fundamenta Informaticae
Memory efficient sanitization of a deduplicated storage system
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Given a sequence S of n symbols over some alphabet Σ, we develop a new compression method that is (i) very simple to implement; (ii) provides O(1) time random access to any symbol of the original sequence; (iii) allows efficient pattern matching over the compressed sequence. Our simplest solution uses at most 2h + o(h) bits of space, where h = n(H0(S)+1), and H0(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. The new method is applied to text compression. We also propose average case optimal string matching algorithms.