A very fast substring search algorithm
Communications of the ACM
A new approach to text searching
Communications of the ACM
Two-dimensional periodicity and its applications
SODA '92 Proceedings of the third annual ACM-SIAM symposium on Discrete algorithms
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Arithmetic coding for data compression
Communications of the ACM
A text compression scheme that allows fast searching directly in the compressed file
ACM Transactions on Information Systems (TOIS)
Compact pat trees
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Fast and flexible string matching by combining bit-parallelism and suffix automata
Journal of Experimental Algorithmics (JEA)
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Low Redundancy in Static Dictionaries with O(1) Worst Case Lookup Time
ICAL '99 Proceedings of the 26th International Colloquium on Automata, Languages and Programming
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Succinct static data structures
Succinct static data structures
Shift-or string matching with super-alphabets
Information Processing Letters
Rank/select operations on large alphabets: a tool for text indexing
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Squeezing succinct data structures into entropy bounds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
ACM Computing Surveys (CSUR)
Note: A simple storage scheme for strings achieving entropy bounds
Theoretical Computer Science
A simple storage scheme for strings achieving entropy bounds
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Rank and select revisited and extended
Theoretical Computer Science
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
An efficient compression code for text databases
ECIR'03 Proceedings of the 25th European conference on IR research
Simple compression code supporting random access and fast string matching
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Statistical encoding of succinct data structures
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Efficient implementation of rank and select functions for succinct representation
WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms
Universal codeword sets and representations of the integers
IEEE Transactions on Information Theory
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Given a sequence S of n symbols over some alphabet Σ of size σ, we develop new compression methods that are (i) very simple to implement; (ii) provide O(1) time random access to any symbol (or short substring) of the original sequence. Our simplest solution uses at most 2h+o(h) bits of space, where h = n(H$_{0}$(S)+1), and H$_{0}$(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. For example, we can achieve n(H$_{k}$(S)+1)+o(n(H$_{k}$(S)+1)) bits of space, for k = o(log$_{σ}$(n)). Several applications are discussed, including text compression, (compressed) full-text indexing and string matching.