Lempel-Ziv Compression of Structured Text

  • Authors:
  • Joaquín Adiego;Gonzalo Navarro;Pablo de la Fuente

  • Affiliations:
  • -;-;-

  • Venue:
  • DCC '04 Proceedings of the Conference on Data Compression
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a novel Lempel-Ziv approach suitable for compressing structureddocuments, called LZCS, which takes advantage of redundant informationthat can appear in the structure.The main idea is that frequently repeatedsubtrees may exist and these can be replaced by a backward reference to theirfirst occurence.The main advantage is that compressed documents generatedby LZCS are easy to display, access at random, and navigate.In a secondstage, processed documents can be further compressed using some semiadaptivetechnique, so that random access and navigability remain possible.LZCSis especially efficient to compress collections of highly structured data, such asXML forms, invoices, e-commerce and web-service exchange documents.Thecomparison against structure-based and standard compressors shows that LZCSis a competitive choice for this type of documents, while the others are not well-suitedto support navigation or random access.