Visually Lossless HTML Compression

Authors:
Przemysław Skibiński
Affiliations:
Institute of Computer Science, University of Wrocław, Wrocław, Poland 50-383
Venue:
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
Year:
2009

Citing 8
Cited 0

Dictionary-Based Fast Transform for Text Compression

ITCC '03 Proceedings of the International Conference on Information Technology: Computers and Communications
PPM: One Step to Practicality

DCC '02 Proceedings of the Data Compression Conference
Compressing XML with Multiplexed Hierarchical PPM Models

DCC '01 Proceedings of the Data Compression Conference
Variable-length contexts for PPM

DCC '04 Proceedings of the Conference on Data Compression
Revisiting dictionary-based compression: Research Articles

Software—Practice & Experience
Using structural contexts to compress semistructured text collections

Information Processing and Management: an International Journal
Effective asymmetric XML compression

Software—Practice & Experience
Mapping words into codewords on PPM

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The verbosity of the Hypertext Markup Language (HTML) remains one of its main weaknesses. This problem can be solved with the aid of HTML specialized compression algorithms. In this work, we describe a visually lossless HTML transform that, combined with generally used compression algorithms, allows to attain high compression ratios. Its core is a transform featuring substitution of words in an HTML document using a static English dictionary, effective encoding of dictionary indexes, numbers, and specific patterns. Visually lossless compression means that the HTML document layout will be modified, but the document displayed in a browser will provide the exact fidelity with the original. The experimental results show that the proposed transform improves the HTML compression efficiency of general purpose compressors on average by 21% in the case of gzip, achieving comparable processing speed. Moreover, we show that the compression ratio of gzip can be improved by up to 32% for the price of higher memory requirements and much slower processing.