Optimizing Sample Design for Approximate Query Processing

Authors:
Philipp Rösch;Wolfgang Lehner
Affiliations:
Business Intelligence Practice, SAP Research, Dresden, Germany;Database Technology Research Group, Dresden University of Technology, Dresden, Germany
Venue:
International Journal of Knowledge-Based Organizations
Year:
2013

Citing 28
Cited 0

Random sampling with a reservoir

ACM Transactions on Mathematical Software (TOMS)
Physical database design for relational databases

ACM Transactions on Database Systems (TODS)
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
AutoAdmin “what-if” index analysis utility

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Automating Statistics Management for Query Optimizers

IEEE Transactions on Knowledge and Data Engineering
A Framework for the Physical Design Problem for Data Synopses

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
Histogram-Based Approximation of Set-Valued Query-Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate Query Processing Using Wavelets

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Selection of Views to Materialize in a Data Warehouse

IEEE Transactions on Knowledge and Data Engineering
A disk-based join with probabilistic guarantees

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Recommending Materialized Views and Indexes with IBM DB2 Design Advisor

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Scalable approximate query processing with the DBO engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Robustness in automatic physical database design

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Linked Bernoulli Synopses: Sampling along Foreign Keys

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Sample synopses for approximate answering of group-by queries

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Designing Random Sample Synopses with Outliers

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A sample advisor for approximate query processing

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
HYRISE: a main memory hybrid storage engine

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The rapid increase of data volumes makes sampling a crucial component of modern data management systems. Although there is a large body of work on database sampling, the problem of automatically determine the optimal sample for a given query remained almost unaddressed. To tackle this problem the authors propose a sample advisor based on a novel cost model. Primarily designed for advising samples of a few queries specified by an expert, the authors additionally propose two extensions of the sample advisor. The first extension enhances the applicability by utilizing recorded workload information and taking memory bounds into account. The second extension increases the effectiveness by merging samples in case of overlapping pieces of sample advice. For both extensions, the authors present exact and heuristic solutions. Within their evaluation, the authors analyze the properties of the cost model and demonstrate the effectiveness and the efficiency of the heuristic solutions with a variety of experiments.