Partial retrieval of compressed semi-structured documents

  • Authors:
  • Ashutosh Gupta;Suneeta Agarwal

  • Affiliations:
  • Department of Computer Science & Information Technology, Institute of Engineering and Technology, MJP Rohilkhand University, Bareilly, India.;Department of Computer Science & Engineering, MNNIT, Allahabad, India

  • Venue:
  • International Journal of Computer Applications in Technology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a compression model called tri-structural contexts model (TSCM), for semi-structured documents. The intention is that separation of the start tag, the attribute name/attribute value and textual words may reduce the entropy. We also combine the attributes with their values and use a separate container for them. We mainly focus on semi-static models, and test our idea using a word-based tagged code. This code allows random access and partial decompression of the compressed collection. The compression time is found to be better than scmhuff and decompression time is also observed much less than scmhuff and xmlppm. The shorter time for partial decompression emphasises the use of TSC model to keep the semi-structured document compressed all the time. The algorithm and proposed model are useful in information retrieval systems.