XQueC: A query-conscious compressed XML database

  • Authors:
  • Andrei Arion;Angela Bonifati;Ioana Manolescu;Andrea Pugliese

  • Affiliations:
  • INRIA Futurs---LRI, PCRI, France;ICAR CNR, Italy;INRIA Futurs---LRI, PCRI, France;DEIS---University of Calabria, Italy

  • Venue:
  • ACM Transactions on Internet Technology (TOIT)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML compression has gained prominence recently because it counters the disadvantage of the verbose representation XML gives to data. In many applications, such as data exchange and data archiving, entirely compressing and decompressing a document is acceptable. In other applications, where queries must be run over compressed documents, compression may not be beneficial since the performance penalty in running the query processor over compressed data outweighs the data compression benefits. While balancing the interests of compression and query processing has received significant attention in the domain of relational databases, these results do not immediately translate to XML data. In this article, we address the problem of embedding compression into XML databases without degrading query performance. Since the setting is rather different from relational databases, the choice of compression granularity and compression algorithms must be revisited. Query execution in the compressed domain must also be rethought in the framework of XML query processing due to the richer structure of XML data. Indeed, a proper storage design for the compressed data plays a crucial role here. The XQueC system (XQuery Processor and Compressor) covers a wide set of XQuery queries in the compressed domain and relies on a workload-based cost model to perform the choices of the compression granules and of their corresponding compression algorithms. As a consequence, XQueC provides efficient query processing on compressed XML data. An extensive experimental assessment is presented, showing the effectiveness of the cost model, the compression ratios, and the query execution times.