XML Lossy Text Compression: A Preliminary Study

  • Authors:
  • Angela Bonifati;Marianna Lorusso;Domenica Sileo

  • Affiliations:
  • Italian National Research Council (CNR), Rende, Italy I-87036 and Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy I-85100;Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy I-85100;Dipartimento di Matematica e Informatica, University of Basilicata, Potenza, Italy I-85100

  • Venue:
  • XSym '09 Proceedings of the 6th International XML Database Symposium on Database and XML Technologies
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Lossy compression techniques have been applied to image and text compression, yielding compression factors that are vastly superior to lossless compression schemes. In this paper, we present a preliminary study on a set of lossy transformations for XML documents that preserve the semantics. Inspired by previous techniques, e.g. lossy text compression and literate programming, we apply a simple algorithm to XML syntactic constructs to loose superfluous layout information and redundant text. The obtained XML keeps the human-readability and machine-readability properties. Additionally, it can lead to a considerable reduction of its space occupancy and boost the application of conventional text compressors, thus representing a promising technology for several data management tasks.