A locally adaptive data compression scheme
Communications of the ACM
Sturmian words: structure, combinatorics, and their arithmetics
Theoretical Computer Science - Special issue: formal language theory
Balanced sequences and optimal routing
Journal of the ACM (JACM)
Fraenkel's conjecture for six sequences
Discrete Mathematics
Episturmian words and some constructions of de Luca and Rauzy
Theoretical Computer Science
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Episturmian words and episturmian morphisms
Theoretical Computer Science
Invited Lecture: The Burrows-Wheeler Transform: Theory and Practice
MFCS '99 Proceedings of the 24th International Symposium on Mathematical Foundations of Computer Science
Minimizing Service and Operation Costs of Periodic Scheduling
Mathematics of Operations Research
Burrows--Wheeler transform and Sturmian words
Information Processing Letters
Characterisations of balanced words via orderings
Theoretical Computer Science
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
The engineering of a compression boosting library: theory vs practice in BWT compression
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A simpler analysis of Burrows–Wheeler-based compression
Theoretical Computer Science
European Journal of Combinatorics
A new characteristic property of rich words
Theoretical Computer Science
Burrows-Wheeler transform and palindromic richness
Theoretical Computer Science
Balanced Words Having Simple Burrows-Wheeler Transform
DLT '09 Proceedings of the 13th International Conference on Developments in Language Theory
On a generalization of Christoffel words: epichristoffel words
Theoretical Computer Science
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
Move-to-front, distance coding, and inversion frequencies revisited
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Palindromic richness for languages invariant under more symmetries
Theoretical Computer Science
Hi-index | 5.23 |
Compression algorithms based on Burrows-Wheeler transform (BWT) take advantage of the fact that the word output of BWT shows a local similarity and then turns out to be highly compressible. The aim of the present paper is to study such ''clustering effect'' by using notions and methods from Combinatorics on Words. The notion of balance of a word plays a central role in our investigation. Empirical observations suggest that balance is actually the combinatorial property of input word that ensure optimal BWT compression. Moreover, it is reasonable to assume that the more balanced the input word is, the more local similarity we have after BWT (and therefore the better the compression is). This hypothesis is here corroborated by experiments on ''real'' text, by using local entropy as a measure of the degree of balance of a word. In the setting of Combinatorics on Words, a sound confirmation of previous hypothesis is given by a result of Mantaci et al. (2003) [27], which states that, in the case of a binary alphabet, there is an equivalence between circularly balanced words, words having a clusterized BWT, and the conjugates of standard words. In the case of alphabets of size greater than two, there is no more equivalence. The last section of the present paper is devoted to investigate the relationships between these notions, and other related ones (as, for instance, palindromic richness) in the case of a general alphabet.