Quantiles over data streams: an experimental study

  • Authors:
  • Lu Wang;Ge Luo;Ke Yi;Graham Cormode

  • Affiliations:
  • The Hong Kong University of Science and Technology, Hong Kong, Hong Kong;The Hong Kong University of Science and Technology, Hong Kong, Hong Kong;The Hong Kong University of Science and Technology, Hong Kong, Hong Kong;AT&T Labs -- Research, Florham Park, NJ, USA

  • Venue:
  • Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

A fundamental problem in data management and analysis is to generate descriptions of the distribution of data. It is most common to give such descriptions in terms of the cumulative distribution, which is characterized by the quantiles of the data. The design and engineering of efficient methods to find these quantiles has attracted much study, especially in the case where the data is described incrementally, and we must compute the quantiles in an online, streaming fashion. Yet while such algorithms have proved to be tremendously useful in practice, there has been limited formal comparison of the competing methods, and no comprehensive study of their performance. In this paper, we remedy this deficit by providing a taxonomy of different methods, and describe efficient implementations. In doing so, we propose and analyze variations that have not been explicitly studied before, yet which turn out to perform the best. To illustrate this, we provide detailed experimental comparisons demonstrating the tradeoffs between space, time, and accuracy for quantile computation.