External sorting with on-the-fly compression

  • Authors:
  • John Yiannis;Justin Zobel

  • Affiliations:
  • School of Computer Science and Information Technology, RMIT University, Melbourne, Australia;School of Computer Science and Information Technology, RMIT University, Melbourne, Australia

  • Venue:
  • BNCOD'03 Proceedings of the 20th British national conference on Databases
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Evaluating a query can involve manipulation of large volumes of temporary data. When the volume of data becomes too great, activities such as joins and sorting must use disk, and cost minimisation involves complex trade-offs. In this paper, we explore the effect of compression on the cost of external sorting. Reduction in the volume of data potentially allows costs to be reduced - through reductions in disk traffic and numbers of temporary files - but on-the-fly compression can be slow and many compression methods do not allow random access to individual records. We investigate a range of compression techniques for this problem, and develop successful methods based on common letter sequences. Our experiments show that, for a given memory limit, the overheads of compression outweigh the benefits for smaller data volumes, but for large files compression can yield substantial gains, of one-third of costs in the best case tested. Even when the data is stored uncompressed, our results show that incorporation of compression can significantly accelerate query processing.