How to roll a join: asynchronous incremental view maintenance
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Maintaining data warehouses over changing information sources
Communications of the ACM
Comparative Models of the File Assignment Problem
ACM Computing Surveys (CSUR)
Journal of Parallel and Distributed Computing
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence
Genetic Algorithms in Search, Optimization and Machine Learning
Genetic Algorithms in Search, Optimization and Machine Learning
Automating physical database design in a parallel database
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Optimizing Queries with Materialized Views
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Job Shop Scheduling with Genetic Algorithms
Proceedings of the 1st International Conference on Genetic Algorithms
Automated Selection of Materialized Views and Indexes in SQL Databases
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Introduction to Evolutionary Computing
Introduction to Evolutionary Computing
Computation scheduling and data replication algorithms for data Grids
Grid resource management
How to Solve It: Modern Heuristics
How to Solve It: Modern Heuristics
Load and Network Aware Query Routing for Information Integration
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Recommending Materialized Views and Indexes with IBM DB2 Design Advisor
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
An evaluation of the close-to-files processor and data co-allocation policy in multiclusters
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
A taxonomy of Data Grids for distributed data sharing, management, and processing
ACM Computing Surveys (CSUR)
Automatic physical design tuning: workload as a sequence
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Task scheduling strategies for workflow-based applications in grids
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
Computing queries from derived relations
VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
DB2 design advisor: integrated automatic physical database design
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Integration of scheduling and replication in data grids
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
A request-routing framework for SOA-based enterprise computing
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Enterprises may have multiple database systems spread across the organization for redundancy or for serving different applications. In such systems, query workloads can be distributed across different servers for better performance. A materialized view, or Materialized Query Table (MQT), is an auxiliary table with pre-computed data that can be used to significantly improve the performance of a database query. In this paper, we propose a framework for coordinating execution of OLAP query workloads across a database cluster with shared nothing architecture. Such coordination is complex since we need to consider (1) the time to build the MQTs, (2) the query execution impact of the MQTs, (3) whether the MQTs can fit in the disk space limitation, (4) server computation power, and (5) the effectiveness of the scheduling and placement algorithms in deriving a combination of configurations so that the workload can be completed in the shortest time period. We frame the problem as a combinatorial problem with a solution space that is exponential in the number of queries, MQTs, and servers. We provide a stochastic search heuristic that finds a near-optimal mapping of queries-to-servers and MQTs-to-servers within an arbitrarily bounded time and compare our solution with an exhaustive search and three standard greedy algorithms. Our search implementation produced schedules within 9% of the optimal found through an exhaustive search and produced better solutions than typical greedy algorithms for both TPC-H and synthetic benchmarks under a variety of experiments. For a key trial where disk space is limited, it produced 15% better results than the next best competitor, corresponding to an absolute wall clock advantage of over 10 hours.