Data compression via textual substitution
Journal of the ACM (JACM)
XMill: an efficient compressor for XML data
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A Fast Block-Sorting Algorithm for Lossless Data Compression
DCC '97 Proceedings of the Conference on Data Compression
DCC '02 Proceedings of the Data Compression Conference
Compressing XML with Multiplexed Hierarchical PPM Models
DCC '01 Proceedings of the Data Compression Conference
XGRIND: A Query-Friendly XML Compressor
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Revisiting dictionary-based compression: Research Articles
Software—Practice & Experience
Comparative Analysis of XML Compression Technologies
World Wide Web
Tradeoffs in XML Database Compression
DCC '06 Proceedings of the Data Compression Conference
Compressing and searching XML data via two zips
Proceedings of the 15th international conference on World Wide Web
XCQ: A queriable XML compression system
Knowledge and Information Systems
Using structural contexts to compress semistructured text collections
Information Processing and Management: an International Journal
XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
Visually Lossless HTML Compression
WISE '09 Proceedings of the 10th International Conference on Web Information Systems Engineering
A highly efficient XML compression scheme for the web
SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
A query-friendly compression for GML documents
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications
Schema Independent XML Compressor
International Journal of Information Retrieval Research
A spatial proximity based compression method for GML documents
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
The innate verbosity of the extensible markup language (XML) remains one of its main weaknesses, especially when large documents are concerned. This problem can be solved with the aid of dedicated XML compression algorithms. In this work, we describe XML word-replacing transform (XML-WRT), a fast and fully reversible XML transform, which, when combined with generally used LZ77-style compression algorithms, allows to attain high compression ratios, comparable to those achieved by the current state-of-the-art XML compressors. The resulting compression scheme is asymmetric in the sense that its decoder is much faster than the coder. This is a desirable practical property, as in many XML applications data are read much more often than written. The key features of the transform are dictionary-based encoding of both document structure and content, separation of different content types into multiple streams, and dedicated encoding of specific patterns, including numbers and dates. The test results show that the proposed transform improves the XML compression efficiency of general-purpose compressors on average by 35% in case of gzip, and 17% in case of LZMA. Compared with the current state-of-the-art SCMPPM algorithm, XML-WRT with LZMA attains over 2% better compression ratio, while being 55% faster. Copyright © 2007 John Wiley & Sons, Ltd.