Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Self-tuning histograms: building histograms without looking at data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
PDIS '93 Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
LEO - DB2's LEarning Optimizer
Proceedings of the 27th International Conference on Very Large Data Bases
Statistical learning techniques for costing XML queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Execution strategies for SQL subqueries
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Predicting completion times of batch query workloads using interaction-aware models and simulation
Proceedings of the 14th International Conference on Extending Database Technology
Workload-aware database monitoring and consolidation
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Performance prediction for concurrent database workloads
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Learning-based Query Performance Modeling and Prediction
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Workload management for big data analytics
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Towards predicting query execution time for concurrent and dynamic database workloads
Proceedings of the VLDB Endowment
Distribution-based query scheduling
Proceedings of the VLDB Endowment
PREDIcT: towards predicting the runtime of large scale iterative analytics
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of statistical techniques for resource estimation in place of the manually constructed cost models used in query optimization. Such techniques, which require as training data examples of resource usage in queries, offer the promise of superior estimation accuracy since they can account for factors such as hardware characteristics of the system or bias in cardinality estimates. However, the proposed approaches lack robustness in that they do not generalize well to queries that are different from the training examples, resulting in significant estimation errors. Our approach aims to address this problem by combining knowledge of database query processing with statistical models. We model resource-usage at the level of individual operators, with different models and features for each operator type, and explicitly model the asymptotic behavior of each operator. This results in significantly better estimation accuracy and the ability to estimate resource usage of arbitrary plans, even when they are very different from the training instances. We validate our approach using various large scale real-life and benchmark workloads on Microsoft SQL Server.