Tradeoffs in XML Database Compression

Authors:
James Cheney
Affiliations:
University of Edinburgh, UK
Venue:
DCC '06 Proceedings of the Data Compression Conference
Year:
2006

Citing 0
Cited 3

Effective asymmetric XML compression

Software—Practice & Experience
Combining efficient XML compression with query processing

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
A resource efficient hybrid data structure for twig queries

XSym'06 Proceedings of the 4th international conference on Database and XML Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large XML data files, or XML databases, are now a common way to distribute scientific and bibliographic data, and storing such data efficiently is an important concern. A number of approaches to XML compression have been proposed in the last five years. The most competitive approaches employ one or more statistical text compressors based on PPM or arithmetic coding in which some of the context is provided by the XML document structure. The purpose of this paper is to investigate the relationship between the extant proposals in more detail. We review the two main statistical modeling approaches proposed so far, and evaluate their performance on two representative XML databases. Our main finding is that while a recently-proposed multiple-model approach can provide better overall compression for large databases, it uses much more memory and converges more slowly than an older single-model approach.