XML Stream Data Reduction by Shared KST Signatures

  • Authors:
  • Affiliations:
  • Venue:
  • HICSS '09 Proceedings of the 42nd Hawaii International Conference on System Sciences
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Within XML data streams, markup as defined e.g. in a DTD is not only being used for structuring large amounts of data, but also for efficiently searching, accessing, and processing the required parts of the data streams. However when huge amounts of XML data are involved, data reduction or compression techniques that still allow finding the required parts of the data fast may become crucial to handle data processing. We present a data reduction and compression technique for XML data streams that not only significantly reduces the amount of data, but also allows for efficient data processing without requiring a full data decompression. Our data reduction technique combines sub-tree sharing with removing structure that is known by a DTD. We have done extensive performance evaluations to compare our compression technique with other approaches to XML compression, and we show that we not only outperform the other techniques, but also outperform string compression techniques like gzip that do not support query processing on compressed data.