Robust estimation of resource consumption for SQL queries using statistical techniques

Authors:
Jiexing Li;Arnd Christian König;Vivek Narasayya;Surajit Chaudhuri
Affiliations:
University of Wisconsin - Madison, Madison, WI;Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA
Venue:
Proceedings of the VLDB Endowment
Year:
2012

Citing 12
Cited 4

Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Self-tuning histograms: building histograms without looking at data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
STHoles: a multidimensional workload-aware histogram

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Nested Loops Revisited

PDIS '93 Proceedings of the 2nd International Conference on Parallel and Distributed Information Systems
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
Statistical learning techniques for costing XML queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Execution strategies for SQL subqueries

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Predicting completion times of batch query workloads using interaction-aware models and simulation

Proceedings of the 14th International Conference on Extending Database Technology
Workload-aware database monitoring and consolidation

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Performance prediction for concurrent database workloads

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Learning-based Query Performance Modeling and Prediction

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Workload management for big data analytics

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Towards predicting query execution time for concurrent and dynamic database workloads

Proceedings of the VLDB Endowment
Distribution-based query scheduling

Proceedings of the VLDB Endowment
PREDIcT: towards predicting the runtime of large scale iterative analytics

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to estimate resource consumption of SQL queries is crucial for a number of tasks in a database system such as admission control, query scheduling and costing during query optimization. Recent work has explored the use of statistical techniques for resource estimation in place of the manually constructed cost models used in query optimization. Such techniques, which require as training data examples of resource usage in queries, offer the promise of superior estimation accuracy since they can account for factors such as hardware characteristics of the system or bias in cardinality estimates. However, the proposed approaches lack robustness in that they do not generalize well to queries that are different from the training examples, resulting in significant estimation errors. Our approach aims to address this problem by combining knowledge of database query processing with statistical models. We model resource-usage at the level of individual operators, with different models and features for each operator type, and explicitly model the asymptotic behavior of each operator. This results in significantly better estimation accuracy and the ability to estimate resource usage of arbitrary plans, even when they are very different from the training instances. We validate our approach using various large scale real-life and benchmark workloads on Microsoft SQL Server.