Quality and efficiency for kernel density estimates in large data

  • Authors:
  • Yan Zheng;Jeffrey Jestes;Jeff M. Phillips;Feifei Li

  • Affiliations:
  • University of Utah, Salt Lake City, UT, USA;University of Utah, Salt Lake City, UT, USA;University of Utah, Salt Lake City, UT, USA;University of Utah, Salt Lake City, UT, USA

  • Venue:
  • Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Kernel density estimates are important for a broad variety of applications. Their construction has been well-studied, but existing techniques are expensive on massive datasets and/or only provide heuristic approximations without theoretical guarantees. We propose randomized and deterministic algorithms with quality guarantees which are orders of magnitude more efficient than previous algorithms. Our algorithms do not require knowledge of the kernel or its bandwidth parameter and are easily parallelizable. We demonstrate how to implement our ideas in a centralized setting and in MapReduce, although our algorithms are applicable to any large-scale data processing framework. Extensive experiments on large real datasets demonstrate the quality, efficiency, and scalability of our techniques.