Multi-route query processing and optimization

Authors:
Rimma V. Nehme;Karen Works;Chuan Lei;Elke A. Rundensteiner;Elisa Bertino
Affiliations:
Microsoft Research Lab., WI, United States;Worcester Polytechnic Institute, MA, United States;Worcester Polytechnic Institute, MA, United States;Worcester Polytechnic Institute, MA, United States;Purdue University, IN, United States
Venue:
Journal of Computer and System Sciences
Year:
2013

Citing 26
Cited 0

Simulated annealing: theory and applications

Simulated annealing: theory and applications
Efficient sampling strategies for relational database operations

ICDT Selected papers of the 4th international conference on Database theory
A sample set condensation algorithm for the class sensitive artificial neural network

Pattern Recognition Letters
An overview of query optimization in relational systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
System R: relational approach to database management

ACM Transactions on Database Systems (TODS)
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Principles of data mining

Principles of data mining
Machine Learning

Machine Learning
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Optimization of Nonrecursive Queries

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Measuring the Complexity of Join Enumeration in Query Optimization

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
The Volcano Optimizer Generator: Extensibility and Efficient Search

Proceedings of the Ninth International Conference on Data Engineering
Exploiting Punctuation Semantics in Continuous Data Streams

IEEE Transactions on Knowledge and Data Engineering
Bell numbers, their relatives, and algebraic differential equations

Journal of Combinatorial Theory Series A
An initial study of overheads of eddies

ACM SIGMOD Record
Adaptive ordering of pipelined stream filters

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Exploiting Correlated Attributes in Acquisitional Query Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Towards a robust query optimizer: a principled and practical approach

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Content-based routing: different plans for different data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Selectivity-based partitioning: a divide-and-union paradigm for effective query optimization

Proceedings of the 14th ACM international conference on Information and knowledge management
The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Lifting the burden of history from adaptive query processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
CAPE: continuous query engine with heterogeneous-grained adaptivity

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Adaptive query processing

Foundations and Trends in Databases
Self-tuning query mesh for adaptive multi-route query processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A modern query optimizer typically picks a single query plan for all data based on overall data statistics. However, many have observed that real-life datasets tend to have non-uniform distributions. Selecting a single query plan may result in ineffective query execution for possibly large portions of the actual data. In addition most stream query processing systems, given the volume of data, cannot precisely model the system state much less account for uncertainty due to continuous variations. Such systems select a single query plan based upon imprecise statistics. In this paper, we present ''Query Mesh'' (or QM), a practical alternative to state-of-the-art data stream processing approaches. The main idea of QM is to compute multiple routes (i.e., query plans), each designed for a particular subset of the data with distinct statistical properties. We use terms ''plans'' and ''routes'' interchangeably in our work. A classifier model is induced and used to assign the best route to process incoming tuples based upon their data characteristics. We formulate the QM search space and analyze its complexity. Due to the substantial search space, we propose several cost-based query optimization heuristics designed to effectively find nearly optimal QMs. We propose the Self-Routing Fabric (SRF) infrastructure that supports query execution with multiple plans without physically constructing their topologies nor using a central router like Eddy. We also consider how to support uncertain route specification and execution in QM which can occur when imprecise statistics lead to more than one optimal route for a subset of data. Our experimental results indicate that QM consistently provides better query execution performance and incurs negligible overhead compared to the alternative state-of-the-art data stream approaches.