The performance of difference coding for sets and relational tables

  • Authors:
  • Wei Biao Wu;Chinya V. Ravishankar

  • Affiliations:
  • University of Chicago, Chicago, Illinois;University of California---Riverside, Riverside, California

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We characterize the performance of difference coding for compressing sets and database relations through an analysis of the problem of estimating the number of bits needed for storing the spacings between values in sets of integers. We provide analytical expressions for estimating the effectiveness of difference coding when the elements of the sets or the attribute fields in database tuples are drawn from the uniform and Zipf distributions. We also examine the case where a uniformly distributed domain is combined with a Zipf distribution, and with an arbitrary distribution. We present limit theorems for most cases, and probabilistic convergence results in other cases. We also examine the effects of attribute domain reordering on the compression ratio. Our simulations show excellent agreement with theory.