Efficient Data Distribution for DWS

Authors:
Raquel Almeida;Jorge Vieira;Marco Vieira;Henrique Madeira;Jorge Bernardino
Affiliations:
CISUC, Dept. of Informatics Engineering, Univ. of Coimbra, Coimbra, Portugal;CISUC, Critical Software SA, Coimbra, Portugal;CISUC, Dept. of Informatics Engineering, Univ. of Coimbra, Coimbra, Portugal;CISUC, Dept. of Informatics Engineering, Univ. of Coimbra, Coimbra, Portugal;CISUC, ISEC, Coimbra, Portugal
Venue:
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Year:
2008

Citing 2
Cited 2

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling
Experimental Evaluation of a New Distributed Partitioning Technique for Data Warehouses

IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium

Vertical fragmentation of XML data warehouses using frequent path sets

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Avatara: OLAP for web-scale analytics products

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

The DWS (Data Warehouse Striping) technique is a data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in data warehouses is typically a heavy process that gets even more complex when considering distributed environments. Data partitioning brings the need for new loading algorithms that conciliate a balanced distribution of data among nodes with an efficient data allocation (vital to achieve low and uniform response times and, consequently, high performance during the execution of queries). This paper evaluates several alternative algorithms and proposes a generic approach for the evaluation of data distribution algorithms in the context of DWS. The experimental results show that the effective loading of the nodes in a DWS system must consider complementary effects, minimizing the number of distinct keys of any large dimension in the fact tables in each node, as well as splitting correlated rows among the nodes.