Random sampling for histogram construction: how much is enough?
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Automating Statistics Management for Query Optimizers
IEEE Transactions on Knowledge and Data Engineering
A Framework for the Physical Design Problem for Data Synopses
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Overcoming Limitations of Sampling for Aggregation Queries
Proceedings of the 17th International Conference on Data Engineering
An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Selection of Views to Materialize in a Data Warehouse
IEEE Transactions on Knowledge and Data Engineering
Recommending Materialized Views and Indexes with IBM DB2 Design Advisor
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Robustness in automatic physical database design
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Linked Bernoulli Synopses: Sampling along Foreign Keys
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Sample synopses for approximate answering of group-by queries
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Designing Random Sample Synopses with Outliers
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Optimizing Sample Design for Approximate Query Processing
International Journal of Knowledge-Based Organizations
Hi-index | 0.00 |
The rapid growth of current data warehouse systems makes random sampling a crucial component of modern data management systems. Although there is a large body of work on database sampling, the problem of automatic sample selection remained (almost) unaddressed. In this paper, we tackle the problem with a sample advisor. We propose a cost model to evaluate a sample for a given query. Based on this, our sample advisor determines the optimal set of samples for a given set of queries specified by an expert. We further propose an extension to utilize recorded workload information. In this case, the sample advisor takes the set of queries and a given memory bound into account for the computation of a sample advice. Additionally, we consider the merge of samples in case of overlapping sample advice and present both an exact and a heuristic solution. Within our evaluation, we analyze the properties of the cost model and compare the proposed algorithms. We further demonstrate the effectiveness and the efficiency of the heuristic solutions with a variety of experiments.