Compression and the Wheel of Fortune

Authors:
Alberto Apostolico;Laxmi Parida
Affiliations:
-;-
Venue:
DCC '03 Proceedings of the Conference on Data Compression
Year:
2003

Citing 0
Cited 5

Fast gapped variants for Lempel--Ziv--Welch compression

Information and Computation
Motif patterns in 2D

Theoretical Computer Science
Extracting approximate patterns

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
A polynomial space and polynomial delay algorithm for enumeration of maximal motifs in a sequence

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Bridging lossy and lossless compression by motif pattern discovery

General Theory of Information Transfer and Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present data compression techniques hinged on the notion of a motif,interpreted here as a string of intermittently solid and wild characters that recursmore or less frequently in an input sequence or family of sequences.Thisnotion arises orginally in the analysis of sequences, particularly biomolecules,due to its multiple implications in the understanding of biological structure andfunction, and it has been the subject of various characterizations and study.Correspondingly, motif discovery techniques and tools have been devised.Thistask is made hard by the circumstance that the number of motifs identifiablein general in a sequence can be exponential in the size of that sequence.A significant gain in the direction of reducing the number of motifs is achievedthrough the introduction of irredundant motifs, which in intuitive terms aremotifs of which the structure and list of occurrences cannot be inferred bya combination of other motifs' occurrences.Remarkably, the number of irredundantmotifs in a sequence is at worst linear in the length of that sequence.Although suboptimal, the available procedures for the extraction of such motifsare not prohibitively expensive.Here we show that irredundant motifs can beusefully exploited in lossy compression methods based on textual substitutionand suitable for signals as well as text.Actually, once the motifs in our lossyencodings are disambiguated into corresponding lossless codebooks, they stillprove capable of yielding savings over popular methods in use.Preliminary experimentswith these fungible strategies at the crossroads of lossless and lossydata compression show performances that improve over popular methods bymore than 20% in lossy and 10% in lossless implementations.