Derby/S: a DBMS for sample-based query answering

Authors:
Anja Klein;Rainer Gemulla;Philipp Rösch;Wolfgang Lehner
Affiliations:
TU Dresden;TU Dresden;TU Dresden;TU Dresden
Venue:
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Year:
2006

Citing 7
Cited 2

Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Overcoming Limitations of Sampling for Aggregation Queries

Proceedings of the 17th International Conference on Data Engineering
Aqua: A Fast Decision Support Systems Using Approximate Query Answers

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust estimation with sampling and approximate pre-aggregation

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Deferred maintenance of disk-based random samples

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology

Sample synopses for approximate answering of group-by queries

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Turbo-charging estimate convergence in DBO

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although approximate query processing is a prominent way to cope with the requirements of data analysis applications, current database systems do not provide integrated and comprehensive support for these techniques. To improve this situation, we propose an SQL extension---called SQL/S---for approximate query answering using random samples, and present a prototypical implementation within the engine of the open-source database system Derby---called Derby/S. Our approach significantly reduces the required expert knowledge by enabling the definition of samples in a declarative way; the choice of the specific sampling scheme and its parametrization is left to the system. SQL/S introduces new DDL commands to easily define and administrate random samples subject to a given set of optimization criteria. Derby/S automatically takes care of sample maintenance if the underlying dataset changes. Finally, samples are transparently used during query processing, and error bounds are provided. Our extensions do not affect traditional queries and provide the means to integrate sampling as a first-class citizen into a DBMS.