Approximating and testing k-histogram distributions in sub-linear time

  • Authors:
  • Piotr Indyk;Reut Levi;Ronitt Rubinfeld

  • Affiliations:
  • Massachusetts Institute of Technology, Cambridge, MA, USA;Tel Aviv University, Tel Aviv, Israel;Massachusetts Institute of Technology, Cambridge, MA, USA

  • Venue:
  • PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A discrete distribution p, over [n], is a k histogram if its probability distribution function can be represented as a piece-wise constant function with k pieces. Such a function is represented by a list of k intervals and k corresponding values. We consider the following problem: given a collection of samples from a distribution p, find a k-histogram that (approximately) minimizes the l 2 distance to the distribution p. We give time and sample efficient algorithms for this problem. We further provide algorithms that distinguish distributions that have the property of being a k-histogram from distributions that are ε-far from any k-histogram in the l 1 distance and l 2 distance respectively.