New Sampling-Based Estimators for OLAP Queries

  • Authors:
  • Ruoming Jin;Leo Glimcher;Chris Jermaine;Gagan Agrawal

  • Affiliations:
  • Kent State University;Ohio State University;University of Florida;Ohio State University

  • Venue:
  • ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

One important way in which sampling for approximate query processing in a database environment differs from traditional applications of sampling is that in a database, it is feasible to collect accurate summary statistics from the data in addition to the sample. This paper describes a set of sampling-based estimators for approximate query processing that make use of simple summary statistics to to greatly increase the accuracy of sampling-based estimators. Our estimators are able to give tight probabilistic guarantees on estimation accuracy. They are suitable for low or high dimensional data, and work with categorical or numerical attributes. Furthermore, the information used by our estimators can easily be gathered in a single pass, making them suitable for use in a streaming environment.