Overcoming the scalability limitations of parallel star schema data warehouses

Authors:
João Pedro Costa;José Cecílio;Pedro Martins;Pedro Furtado
Affiliations:
ISEC-Institute Polytechnic of Coimbra, Portugal;University of Coimbra, Portugal;University of Coimbra, Portugal;University of Coimbra, Portugal
Venue:
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Year:
2012

Citing 16
Cited 0

A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Optimizing equijoin queries in distributed databases where relations are hash partitioned

ACM Transactions on Database Systems (TODS)
Accurate modeling of the hybrid hash join algorithm

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A Hash Partition Strategy for Distributed Query Processing

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Performance Measurements of Compressed Bitmap Indices

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Join algorithm costs revisited

The VLDB Journal — The International Journal on Very Large Data Bases
Time-Stratified Sampling for Approximate Answers to Aggregate Queries

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Denormalization Effects on Performance of RDBMS

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 3 - Volume 3
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient, Chunk-Replicated Node Partitioned Data Warehouses

ISPA '08 Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
Optimizing the data warehouse design by hierarchical denormalizing

ACS'08 Proceedings of the 8th conference on Applied computer scince
Constant-Time Query Processing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MOSS-DB: a hardware-aware OLAP database

WAIM'10 Proceedings of the 11th international conference on Web-age information management
ONE: a predictable and scalable DW model

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most Data Warehouses (DW) are stored in Relational Database Management Systems (RDBMS) using a star-schema model. While this model yields a trade-off between performance and storage requirements, huge data warehouses experiment performance problems. Although parallel shared-nothing architectures improve on this matter by a divide-and-conquer approach, issues related to parallelizing join operations cause limitations on that amount of improvement, since they have implications concerning placement, the need to replicate data and/or on-the-fly repartitioning. In this paper, we show how these limitations can be overcome by replacing the star schema by a universal relation approach for more efficient and scalable parallelization. We evaluate the proposed approach using TPC-H benchmark, to both demonstrate that it provides highly predictable response times and almost optimal speedup.