Towards materialized view selection for distributed databases

  • Authors:
  • Leonardo Weiss F. Chaves;Erik Buchmann;Fabian Hueske;Klemens Böhm

  • Affiliations:
  • SAP Research CEC Karlsruhe, Germany;Universität Karlsruhe (TH), Germany;SAP Research CEC Karlsruhe, Germany;Universität Karlsruhe (TH), Germany

  • Venue:
  • Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Materialized views (MV) can significantly improve the query performance of relational databases. In this paper, we consider MVs to optimize complex scenarios where many heterogeneous nodes with different resource constraints (e.g., CPU, IO and network bandwidth) query and update numerous tables on different nodes. Such problems are typical for large enterprises, e.g., global retailers storing thousands of relations on hundreds of nodes at different subsidiaries. Choosing which views to materialize in a distributed, complex scenario is NP-hard. Furthermore, the solution space is huge, and the large number of input factors results in non-monotonic cost models. This prohibits the straightforward use of brute-force algorithms, greedy approaches or proposals from organic computing. For the same reason, all solutions for choosing MVs we are aware of do not consider either distributed settings or update costs. In this paper we describe an algorithmic framework which restricts the sets of considered MVs so that a genetic algorithm can be applied. In order to let the genetic algorithm converge quickly, we generate initial populations based on knowledge on database tuning, and devise a selection function which restricts the solution space by taking the similarity of MV configurations into account. We evaluate our approach both with artificial settings and a real-world RFID scenario from retail. For a small setting consisting of 24 tables distributed over 9 nodes, an exhaustive search needs 10 hours processing time. Our approach derives a comparable set of MVs within 30 seconds. Our approach scales well: Within 15 minutes it chooses a set of MVs for a real-world scenario consisting of 1,000 relations, 400 hosts, and a workload of 3,000 queries and updates.