Searchable compression of office documents by XML schema subtraction

  • Authors:
  • Stefan Böttcher;Rita Hartel;Christian Messinger

  • Affiliations:
  • University of Paderborn, Computer Science, Paderborn, Germany;University of Paderborn, Computer Science, Paderborn, Germany;University of Paderborn, Computer Science, Paderborn, Germany

  • Venue:
  • XSym'10 Proceedings of the 7th international XML database conference on Database and XML technologies
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Starting with Microsoft Office 2007, the Office Open XML file formats have become the default file format of Microsoft Office. As each day a lot of office documents have to be stored and transferred, reducing the document size will yield a benefit when storing and transferring these files. We present a compressed format for XML-based office documents that omits that data from an office document that is already defined by the Office Open XML format. Our evaluation shows that our compressed format reduces the - already compressed - office documents to a data size down to 41% of the original document size. Furthermore, for search operations tested in our evaluation, searching is faster on our compressed office documents than it is on the original documents.