Overcoming the scalability limitations of parallel star schema data warehouses

  • Authors:
  • João Pedro Costa;José Cecílio;Pedro Martins;Pedro Furtado

  • Affiliations:
  • ISEC-Institute Polytechnic of Coimbra, Portugal;University of Coimbra, Portugal;University of Coimbra, Portugal;University of Coimbra, Portugal

  • Venue:
  • ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most Data Warehouses (DW) are stored in Relational Database Management Systems (RDBMS) using a star-schema model. While this model yields a trade-off between performance and storage requirements, huge data warehouses experiment performance problems. Although parallel shared-nothing architectures improve on this matter by a divide-and-conquer approach, issues related to parallelizing join operations cause limitations on that amount of improvement, since they have implications concerning placement, the need to replicate data and/or on-the-fly repartitioning. In this paper, we show how these limitations can be overcome by replacing the star schema by a universal relation approach for more efficient and scalable parallelization. We evaluate the proposed approach using TPC-H benchmark, to both demonstrate that it provides highly predictable response times and almost optimal speedup.