Dictionary-Based Fast Transform for Text Compression
ITCC '03 Proceedings of the International Conference on Information Technology: Computers and Communications
DCC '02 Proceedings of the Data Compression Conference
Compressing XML with Multiplexed Hierarchical PPM Models
DCC '01 Proceedings of the Data Compression Conference
Variable-length contexts for PPM
DCC '04 Proceedings of the Conference on Data Compression
Revisiting dictionary-based compression: Research Articles
Software—Practice & Experience
Using structural contexts to compress semistructured text collections
Information Processing and Management: an International Journal
Effective asymmetric XML compression
Software—Practice & Experience
Mapping words into codewords on PPM
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Hi-index | 0.00 |
The verbosity of the Hypertext Markup Language (HTML) remains one of its main weaknesses. This problem can be solved with the aid of HTML specialized compression algorithms. In this work, we describe a visually lossless HTML transform that, combined with generally used compression algorithms, allows to attain high compression ratios. Its core is a transform featuring substitution of words in an HTML document using a static English dictionary, effective encoding of dictionary indexes, numbers, and specific patterns. Visually lossless compression means that the HTML document layout will be modified, but the document displayed in a browser will provide the exact fidelity with the original. The experimental results show that the proposed transform improves the HTML compression efficiency of general purpose compressors on average by 21% in the case of gzip, achieving comparable processing speed. Moreover, we show that the compression ratio of gzip can be improved by up to 32% for the price of higher memory requirements and much slower processing.