Self-tuning query mesh for adaptive multi-route query processing

Authors:
Rimma V. Nehme;Elke A. Rundensteiner;Elisa Bertino
Affiliations:
Purdue University, West Lafayette, IN;Worcester Polytechnic Institute, Worcester, MA;Purdue University, West Lafayette, IN
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 37
Cited 6

Elements of information theory

Elements of information theory
Efficient sampling strategies for relational database operations

ICDT Selected papers of the 4th international conference on Database theory
Machine learning, neural and statistical classification

Machine learning, neural and statistical classification
Efficient mid-query re-optimization of sub-optimal query execution plans

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Implications of certain assumptions in database performance evauation

ACM Transactions on Database Systems (TODS)
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data-streams and histograms

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Principles of data mining

Principles of data mining
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data streams under block evolution

ACM SIGKDD Explorations Newsletter
Machine Learning

Machine Learning
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Characterizing memory requirements for queries over continuous data streams

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computer Networking: A Top-Down Approach Featuring the Internet

Computer Networking: A Top-Down Approach Featuring the Internet
Rate-based query optimization for streaming information sources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
RainForest—A Framework for Fast Decision Tree Construction of Large Datasets

Data Mining and Knowledge Discovery
Considering data skew factor in multi-way join query optimization for parallel execution

The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Effective Learning in Dynamic Environments by Explicit Context Tracking

ECML '93 Proceedings of the European Conference on Machine Learning
Adapting to Drift in Continuous Domains (Extended Abstract)

ECML '95 Proceedings of the 8th European Conference on Machine Learning
LEO - DB2's LEarning Optimizer

Proceedings of the 27th International Conference on Very Large Data Bases
An Interval Classifier for Database Mining Applications

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Efficient query processing for data integration

Efficient query processing for data integration
Mining concept-drifting data streams using ensemble classifiers

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
An initial study of overheads of eddies

ACM SIGMOD Record
SQLCM: A Continuous Monitoring Framework for Relational Database Engines

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Adaptive ordering of pipelined stream filters

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Approximating Edit Distance Efficiently

FOCS '04 Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science
Content-based routing: different plans for different data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mining data streams: a review

ACM SIGMOD Record
On Pushing Multilingual Query Operators into Relational Engines

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Tuple routing strategies for distributed eddies

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Lifting the burden of history from adaptive query processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
CAPE: continuous query engine with heterogeneous-grained adaptivity

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Adaptive query processing

Foundations and Trends in Databases
Teddies: trained Eddies for reactive stream processing

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
A random method for quantifying changing distributions in data streams

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Query mesh: multi-route query processing technology

Proceedings of the VLDB Endowment
Parallel processing of continuous queries over data streams

Distributed and Parallel Databases
Sharing-aware horizontal partitioning for exploiting correlations during query processing

Proceedings of the VLDB Endowment
Achieving high freshness and optimal throughput in CPU-limited execution of multi-join continuous queries

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Multi-route query processing and optimization

Journal of Computer and System Sciences
Optimizing adaptive multi-route query processing via time-partitioned indices

Journal of Computer and System Sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

In real-life applications, different subsets of data may have distinct statistical properties, e.g., various websites may have diverse visitation rates, different categories of stocks may have dissimilar price fluctuation patterns. For such applications, it can be fruitful to eliminate the commonly made single execution plan assumption and instead execute a query using several plans, each optimally serving a subset of data with particular statistical properties. Furthermore, in dynamic environments, data properties may change continuously, thus calling for adaptivity. The intriguing question is: can we have an execution strategy that (1) is plan-based to leverage on all the benefits of traditional plan-based systems, (2) supports multiple plans each customized for different subset of data, and yet (3) is as adaptive as "plan-less" systems like Eddies? While the recently proposed Query Mesh (QM) approach provides a foundation for such an execution paradigm, it does not address the question of adaptivity required for highly dynamic environments. In this work, we fill this gap by proposing a Self-Tuning Query Mesh (ST-QM) --- an adaptive solution for content-based multi-plan execution engines. ST-QM addresses adaptive query processing by abstracting it as a concept drift problem --- a well-known subject in machine learning. Such abstraction allows to discard adaptivity candidates (i.e., the cases indicating a change in the environment) early in the process if they are insignificant or not "worthwhile" to adapt to, and thus minimize the adaptivity overhead. A unique feature of our aproach is that all logical transformations to the execution strategy get translated into a single inexpensive physical operation --- the classifier change. Our experimental evaluation using a continuous query engine shows the performance benefits of ST-QM approach over the alternatives, namely the non-adaptive and the Eddies-based solutions.