Fast gapped variants for Lempel--Ziv--Welch compression
Information and Computation
Bridging lossy and lossless compression by motif pattern discovery
General Theory of Information Transfer and Combinatorics
Fast computation of entropic profiles for the detection of conservation in genomes
PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
Hi-index | 0.00 |
We present variants of classical data compression paradigms by Ziv, Lempel, and Welchin which the phrases used in compression are selected among suitably chosen motifs, definedhere as strings of intermittently solid and wild characters that recur more or less frequently inthe source textstring.This notion emerged primarily in the analysis of biological sequencesand molecules.Whereas the number of motifs in a sequence or family may be exponentialin the size of the input, a linear-sized basis of irredundant motifs may be defined such thatany other motif can be obtained by the union of a suitable subset from the basis.Previousstudy has exposed the advantages of using irredundant motifs in lossy as well as lossless off-line compression.In the present paper, we examine adaptations and extensions of classical incremental ZL and ZLW paradigms.First, hybrid schemata are proposed along these lines, in which motifs may be discovered and selected off-line, while the parse and encoding is still conducted on-line.The performances thus obtained improve on the one hand over previous off-line implementations of motif-based compression, and on the other, over the traditionally best implementations of ZLW.On the basis of this, both lossy and losslessmotif-based schemata are introduced and tested that follow more closely the ZL and ZLWparadigms.