Re-Ordered FEGC and Block Based FEGC for Inverted File Compression

  • Authors:
  • S. Domnic;V. Glory

  • Affiliations:
  • Department of Computer Applications, National Institute of Technology, Tiruchirappalli, Tamil Nadu, India;National Institute of Technology, Tiruchirappalli, Tamil Nadu, India

  • Venue:
  • International Journal of Information Retrieval Research
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data compression has been widely used in many Information Retrieval based applications like web search engines, digital libraries, etc. to enable the retrieval of data to be faster. In these applications, universal codes Elias codes EC, Fibonacci code FC, Rice code RC, Extended Golomb code EGC, Fast Extended Golomb code FEGC etc. have been preferably used than statistical codes Huffman codes, Arithmetic codes etc. Universal codes are easy to be constructed and decoded than statistical codes. In this paper, the authors have proposed two methods to construct universal codes based on the ideas used in Rice code and Fast Extended Golomb Code. One of the authors' methods, Re-ordered FEGC, can be suitable to represent small, middle and large range integers where Rice code works well for small and middle range integers. It is also competing with FC, EGC and FEGC in representing small, middle and large range integers. But it could be faster in decoding than FC, EGC and FEGC. The authors' another coder, Block based RFEGC, uses local divisor rather than global divisor to improve the performance both compression and decompression of RFEGC. To evaluate the performance of the authors' coders, the authors have applied their methods to compress the integer values of the inverted files constructed from TREC, Wikipedia and FIRE collections. Experimental results show that their coders achieve better performance both compression and decompression for those files which contain significant distribution of middle and large range integers.