Efficient Data Distribution for DWS

  • Authors:
  • Raquel Almeida;Jorge Vieira;Marco Vieira;Henrique Madeira;Jorge Bernardino

  • Affiliations:
  • CISUC, Dept. of Informatics Engineering, Univ. of Coimbra, Coimbra, Portugal;CISUC, Critical Software SA, Coimbra, Portugal;CISUC, Dept. of Informatics Engineering, Univ. of Coimbra, Coimbra, Portugal;CISUC, Dept. of Informatics Engineering, Univ. of Coimbra, Coimbra, Portugal;CISUC, ISEC, Coimbra, Portugal

  • Venue:
  • DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The DWS (Data Warehouse Striping) technique is a data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in data warehouses is typically a heavy process that gets even more complex when considering distributed environments. Data partitioning brings the need for new loading algorithms that conciliate a balanced distribution of data among nodes with an efficient data allocation (vital to achieve low and uniform response times and, consequently, high performance during the execution of queries). This paper evaluates several alternative algorithms and proposes a generic approach for the evaluation of data distribution algorithms in the context of DWS. The experimental results show that the effective loading of the nodes in a DWS system must consider complementary effects, minimizing the number of distinct keys of any large dimension in the fact tables in each node, as well as splitting correlated rows among the nodes.