Cluster-and-conquer: hierarchical multi-metric query processing in large-scale database federations

Authors:
Di Wang;Murali Mani;Elke A. Rundersteiner
Affiliations:
Worcester Polytechnic Institute, Worcester, MA;University of Michigan, Flint, Flint, MI;Worcester Polytechnic Institute, Worcester, MA
Venue:
Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Year:
2010

Citing 28
Cited 0

Tradeoffs in processing complex join queries via hashing in multiprocessor database machines

Proceedings of the sixteenth international conference on Very large databases
Federated database systems for managing distributed, heterogeneous, and autonomous databases

ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
Optimization of dynamic query evaluation plans

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Query caching and optimization in distributed mediator systems

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Multidatabase Query Optimization

Distributed and Parallel Databases
Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems

Distributed and Parallel Databases
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Providing Security and Interoperation of HeterogeneousSystems

Distributed and Parallel Databases - Security of data and transaction processing
Iterative dynamic programming: a new class of query optimization algorithms

ACM Transactions on Database Systems (TODS)
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Optimization of parallel query execution plans in XPRS

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Efficient evaluation of queries in a mediator for WebSources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Garlic: a new flavor of federated query processing for DB2

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
On the Complexity of Distributed Query Optimization

IEEE Transactions on Knowledge and Data Engineering
Distributed Query Optimization in Loosly Coupled Multidatabase Systems

ICDT '95 Proceedings of the 5th International Conference on Database Theory
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Parametric Query Optimization

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Data placement in shared-nothing parallel database systems

The VLDB Journal — The International Journal on Very Large Data Bases
Mariposa: a wide-area distributed database system

The VLDB Journal — The International Journal on Very Large Data Bases
Adapting to source properties in processing data integration queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Data integration through database federation

IBM Systems Journal
Distributed/Heterogeneous Query Processing in Microsoft SQL Server

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Revisiting pipelined parallelism in multi-join query processing

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Network-Aware Operator Placement for Stream-Processing Systems

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Network-Aware Join Processing in Global-Scale Database Federations

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The federated database architecture has been introduced to maintain the autonomy of individual data sources yet accomplish federated task for diverse applications from traditional enterprises to computational sciences. We identify two challenging problems of query optimization in large-scale database federation systems. First, run-time conditions of data sources have a profound effect on the performance of database federations, yet the distributed environment of database federations makes it prohibitively expensive for the optimizer to gather rapidly fluctuating run-time conditions from remote data sources. Second, large-scale database federation systems are often widely distributed and built on heterogeneous networks, thus efficiently utilizing network resources is of ever increasing importance for query scheduling. In this paper, we propose to exploit the clustered hierarchical structure of database federations to solve these two problems. Our Cluster-and-Conquer strategy coordinates hierarchical clusters of data sources to optimize and process queries cooperatively. Within each cluster we employ an I/O-bound cost model with run-time conditions being accessible with relatively little delay. While among clusters a network-bound cost model is instead utilized to capture the network heterogeneity and optimize the query plans for efficient network utilization. The experimental study on the prototype database federation system with real-world network settings shows the effectiveness of our Cluster-and-Conquer strategy for scheduling data-intensive queries, as well as demonstrates the performance benefits of our proposed strategies over existing state-of-art solutions.