Guessing the extreme values in a data set: a Bayesian method and its applications

  • Authors:
  • Mingxi Wu;Chris Jermaine

  • Affiliations:
  • Computer and Information Science and Engineering Department, University of Florida, Gainesville, USA 32611;Computer and Information Science and Engineering Department, University of Florida, Gainesville, USA 32611

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

For a large number of data management problems, it would be very useful to be able to obtain a few samples from a data set, and to use the samples to guess the largest (or smallest) value in the entire data set. Min/max online aggregation, Top-k query processing, outlier detection, and distance join are just a few possible applications. This paper details a statistically rigorous, Bayesian approach to attacking this problem. Just as importantly, we demonstrate the utility of our approach by showing how it can be applied to four specific problems that arise in the context of data management.