A locally adaptive data compression scheme
Communications of the ACM
Linear approximation of shortest superstrings
Journal of the ACM (JACM)
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Proof verification and the hardness of approximation problems
Journal of the ACM (JACM)
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Burrows--Wheeler transform and Sturmian words
Information Processing Letters
Higher Compression from the Burrows-Wheeler Transform by Modified Sorting
DCC '98 Proceedings of the Conference on Data Compression
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications)
Applied Combinatorics on Words (Encyclopedia of Mathematics and its Applications)
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
ACM Computing Surveys (CSUR)
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Hi-index | 5.23 |
We introduce a combinatorial optimization framework that naturally induces a class of optimal word permutations with respect to a suitably defined cost function taking into account various measures of relatedness between words. The Burrows and Wheeler transform (bwt) (cf. [M. Burrows, D. Wheeler, A block sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, 1994]), and its analog for labelled trees (cf. [P. Ferragina, F. Luccio, G. Manzini, S. Muthukrishnan, Structuring labeled trees for optimal succinctness, and beyond, in: Proc. of the 45th Annual IEEE Symposium on Foundations of Computer Science, 2005, pp. 198-207]), are special cases in the class. We also show that the class of optimal word permutations defined here is identical to the one identified by Ferragina et al. for compression boosting [P. Ferragina, R. Giancarlo, G. Manzini, M. Sciortino, Boosting textual compression in optimal linear time, Journal of the ACM 52 (2005) 688-713]. Therefore, they are all highly compressible. We also provide, by using techniques from Combinatorics on Words, a fast method to compute bwt without using any end-of-string symbol. We also investigate more general classes of optimal word permutations, where relatedness of symbols may be measured by functions more complex than context length. For this general problem we provide an instance that is MAX-SNP hard, and therefore unlikely to be solved or approximated efficiently. The results presented here indicate that a key feature of the Burrows and Wheeler transform seems to be, besides compressibility, the existence of efficient algorithms for its computation and inversion.