Relational confidence bounds are easy with the bootstrap

  • Authors:
  • Abhijit Pol;Christopher Jermaine

  • Affiliations:
  • University of Florida, Gainesville, FL;University of Florida, Gainesville, FL

  • Venue:
  • Proceedings of the 2005 ACM SIGMOD international conference on Management of data
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical estimation and approximate query processing have become increasingly prevalent applications for database systems. However, approximation is usually of little use without some sort of guarantee on estimation accuracy, or "confidence bound." Analytically deriving probabilistic guarantees for database queries over sampled data is a daunting task, not suitable for the faint of heart, and certainly beyond the expertise of the typical database system end-user. This paper considers the problem of incorporating into a database system a powerful "plug-in" method for computing confidence bounds on the answer to relational database queries over sampled or incomplete data. This statistical tool, called the bootstrap, is simple enough that it can be used by a data-base programmer with a rudimentary mathematical background, but general enough that it can be applied to almost any statistical inference problem. Given the power and ease-of-use of the bootstrap, we argue that the algorithms presented for supporting the bootstrap should be incorporated into any database system which is intended to support analytic processing.