Tradeoffs in processing complex join queries via hashing in multiprocessor database machines
Proceedings of the sixteenth international conference on Very large databases
Federated database systems for managing distributed, heterogeneous, and autonomous databases
ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
Optimization of dynamic query evaluation plans
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Query caching and optimization in distributed mediator systems
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Multidatabase Query Optimization
Distributed and Parallel Databases
Solving Local Cost Estimation Problem for Global Query Optimization in Multidatabase Systems
Distributed and Parallel Databases
An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Providing Security and Interoperation of HeterogeneousSystems
Distributed and Parallel Databases - Security of data and transaction processing
Iterative dynamic programming: a new class of query optimization algorithms
ACM Transactions on Database Systems (TODS)
The state of the art in distributed query processing
ACM Computing Surveys (CSUR)
Optimization of parallel query execution plans in XPRS
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Efficient evaluation of queries in a mediator for WebSources
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Garlic: a new flavor of federated query processing for DB2
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
On the Complexity of Distributed Query Optimization
IEEE Transactions on Knowledge and Data Engineering
Distributed Query Optimization in Loosly Coupled Multidatabase Systems
ICDT '95 Proceedings of the 5th International Conference on Database Theory
Don't Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Data placement in shared-nothing parallel database systems
The VLDB Journal — The International Journal on Very Large Data Bases
Mariposa: a wide-area distributed database system
The VLDB Journal — The International Journal on Very Large Data Bases
Adapting to source properties in processing data integration queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Data integration through database federation
IBM Systems Journal
Distributed/Heterogeneous Query Processing in Microsoft SQL Server
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Revisiting pipelined parallelism in multi-join query processing
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Network-Aware Operator Placement for Stream-Processing Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Network-Aware Join Processing in Global-Scale Database Federations
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Hi-index | 0.00 |
The federated database architecture has been introduced to maintain the autonomy of individual data sources yet accomplish federated task for diverse applications from traditional enterprises to computational sciences. We identify two challenging problems of query optimization in large-scale database federation systems. First, run-time conditions of data sources have a profound effect on the performance of database federations, yet the distributed environment of database federations makes it prohibitively expensive for the optimizer to gather rapidly fluctuating run-time conditions from remote data sources. Second, large-scale database federation systems are often widely distributed and built on heterogeneous networks, thus efficiently utilizing network resources is of ever increasing importance for query scheduling. In this paper, we propose to exploit the clustered hierarchical structure of database federations to solve these two problems. Our Cluster-and-Conquer strategy coordinates hierarchical clusters of data sources to optimize and process queries cooperatively. Within each cluster we employ an I/O-bound cost model with run-time conditions being accessible with relatively little delay. While among clusters a network-bound cost model is instead utilized to capture the network heterogeneity and optimize the query plans for efficient network utilization. The experimental study on the prototype database federation system with real-world network settings shows the effectiveness of our Cluster-and-Conquer strategy for scheduling data-intensive queries, as well as demonstrates the performance benefits of our proposed strategies over existing state-of-art solutions.