Compacting XML documents

  • Authors:
  • Miklós Kálmán;Ferenc Havasi;Tibor Gyimóthy

  • Affiliations:
  • Department of Software Engineering, Aradi vértanuk tere 1., H-6720 Szeged, Hungary;Department of Software Engineering, Aradi vértanuk tere 1., H-6720 Szeged, Hungary;Department of Software Engineering, Aradi vértanuk tere 1., H-6720 Szeged, Hungary

  • Venue:
  • Information and Software Technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nowadays, one of the most common formats for storing information is XML. The biggest drawback of XML documents is that their size is rather large compared to the information they store. XML documents may contain redundant attributes, which can be calculated from others. These redundant attributes can be deleted from the original XML document if the calculation rules can be stored somehow. In an Attribute Grammar environment there is an analog description for these rules: semantic rules. In order to use this technique in an XML environment we defined a new metalanguage called SRML. We have developed a method, which enables us to use this SRML metalanguage for compacting XML documents. After compaction it is possible to use XML compressors to make the compacted document much smaller. By using this combined approach we could achieve a significant size reduction compared to the compressed size of the XML specific compressors. This article extends the method published earlier to provide the possibility of automatically generating rules using machine learning techniques, with which it can find relationships between attributes which might not have been noticed by the user beforehand.