Parallel database systems: the future of high performance database systems
Communications of the ACM
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
On the Complexity of Generating Optimal Left-Deep Processing Trees with Cross Products
ICDT '95 Proceedings of the 5th International Conference on Database Theory
Optimization of Nonrecursive Queries
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Learning statistical models from relational data
Learning statistical models from relational data
An initial study of overheads of eddies
ACM SIGMOD Record
Exploiting Correlated Attributes in Acquisitional Query Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Content-based routing: different plans for different data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Selectivity-based partitioning: a divide-and-union paradigm for effective query optimization
Proceedings of the 14th ACM international conference on Information and knowledge management
Lifting the burden of history from adaptive query processing
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Foundations and Trends in Databases
Self-tuning query mesh for adaptive multi-route query processing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Query mesh: multi-route query processing technology
Proceedings of the VLDB Endowment
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning
Hi-index | 0.00 |
Optimization of join queries based on average selectivities is suboptimal in highly correlated databases. In such databases, relations are naturally divided into partitions, each partition having substantially different statistical characteristics. It is very compelling to discover such data partitions during query optimization and create multiple plans for a given query, one plan being optimal for a particular combination of data partitions. This scenario calls for the sharing of state among plans, so that common intermediate results are not recomputed. We study this problem in a setting with a routing-based query execution engine based on eddies [1]. Eddies naturally encapsulate horizontal partitioning and maximal state sharing across multiple plans. We define the notion of a conditional join plan, a novel representation of the search space that enables us to address the problem in a principled way. We present a low-overhead greedy algorithm that uses statistical summaries based on graphical models. Experimental results suggest an order of magnitude faster execution time over traditional optimization for high correlations, while maintaining the same performance for low correlations.