Visually Lossless HTML Compression

  • Authors:
  • Przemysław Skibiński

  • Affiliations:
  • Institute of Computer Science, University of Wrocław, Wrocław, Poland 50-383

  • Venue:
  • WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The verbosity of the Hypertext Markup Language (HTML) remains one of its main weaknesses. This problem can be solved with the aid of HTML specialized compression algorithms. In this work, we describe a visually lossless HTML transform that, combined with generally used compression algorithms, allows to attain high compression ratios. Its core is a transform featuring substitution of words in an HTML document using a static English dictionary, effective encoding of dictionary indexes, numbers, and specific patterns. Visually lossless compression means that the HTML document layout will be modified, but the document displayed in a browser will provide the exact fidelity with the original. The experimental results show that the proposed transform improves the HTML compression efficiency of general purpose compressors on average by 21% in the case of gzip, achieving comparable processing speed. Moreover, we show that the compression ratio of gzip can be improved by up to 32% for the price of higher memory requirements and much slower processing.