D-Zipfian: a decentralized implementation of Zipfian

  • Authors:
  • Sumita Barahmand;Shahram Ghandeharizadeh

  • Affiliations:
  • University of Southern California, Los Angeles, California;University of Southern California, Los Angeles, California

  • Venue:
  • Proceedings of the Sixth International Workshop on Testing Database Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Zipfian distribution is used extensively to generate workloads to test, tune, and benchmark data stores. This paper presents a decentralized implementation of this technique, named D-Zipfian, using N parallel generators to issue requests. A request is a reference to a data item from a fixed population of data items. The challenge is for each generator to reference a disjoint set of data items. Moreover, they should finish at approximately the same time by performing work proportional to their processing capability. Intuitively, D-Zipfian assigns a total probability of 1/N to each of the N generators and requires each generator to reference data items with a scaled probability. In the case of heterogeneous generators, the total probability of each generator is proportional to its processing capability. We demonstrate the effectiveness of D-Zipfian using empirical measurements of the chi-square statistic.