Evolution of human-competitive lossless compression algorithms with GP-zip2

  • Authors:
  • Ahmed Kattan;Riccardo Poli

  • Affiliations:
  • School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK CO4 3SQ;School of Computer Science and Electronic Engineering, University of Essex, Colchester, UK CO4 3SQ

  • Venue:
  • Genetic Programming and Evolvable Machines
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose GP-zip2, a new approach to lossless data compression based on Genetic Programming (GP). GP is used to optimally combine well-known lossless compression algorithms to maximise data compression. GP-zip2 evolves programs with multiple components. One component analyses statistical features extracted by sequentially scanning the data to be compressed and divides the data into blocks. These blocks are projected onto a two-dimensional Euclidean space via two further (evolved) program components. K-means clustering is then applied to group similar data blocks. Each cluster is labelled with the optimal compression algorithm for its member blocks. After evolution, evolved programs can be used to compress unseen data. The compression algorithms available to GP-zip2 are: Arithmetic coding, Lempel-Ziv-Welch, Unbounded Prediction by Partial Matching, Run Length Encoding, and Bzip2. Experimentation shows that the results produced by GP-zip2 are human-competitive, being typically superior to well-established human-designed compression algorithms in terms of the compression ratios achieved in heterogeneous archive files.