A highly efficient XML compression scheme for the web

  • Authors:
  • Przemysław Skibiński;Jakub Swacha;Szymon Grabowski

  • Affiliations:
  • University of Wrocław, Institute of Computer Science, Wrocław, Poland;Institute of Information Technology in Management, Szczecin University, Szczecin, Poland;Technical University of Łódź, Computer Engineering Department, Łódź, Poland

  • Venue:
  • SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Contemporary XML documents can be tens of megabytes long, and reducing their size, thus allowing to transfer them faster, poses a significant advantage for their users. In this paper, we describe a new XML compression scheme which outperforms the previous state-of-the-art algorithm, SCMPPM, by over 9% on average in compression ratio, having the practical feature of streamlined decompression and being almost twice faster in the decompression. Applying the scheme can significantly reduce transmission time/bandwidth usage for XML documents published on the Web. The proposed scheme is based on a semi-dynamic dictionary of the most frequent words in the document (both in the annotation and contents), automatic detection and compact encoding of numbers and specific patterns (like dates or IP addresses), and a back-end PPM coding variant tailored to efficiently handle long matching sequences. Moreover, we show that the compression ratio can be improved by additional 9% for the price of a significant slow-down.