Providing timely results with an elastic parallel DW

Authors:
João Pedro Costa;Pedro Martins;José Cecilio;Pedro Furtado
Affiliations:
Polytechnic Institute of Coimbra, Portugal;University of Coimbra, Portugal;University of Coimbra, Portugal;University of Coimbra, Portugal
Venue:
ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems
Year:
2012

Citing 15
Cited 0

A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Optimizing equijoin queries in distributed databases where relations are hash partitioned

ACM Transactions on Database Systems (TODS)
Accurate modeling of the hybrid hash join algorithm

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A Hash Partition Strategy for Distributed Query Processing

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Performance Measurements of Compressed Bitmap Indices

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Join algorithm costs revisited

The VLDB Journal — The International Journal on Very Large Data Bases
Time-Stratified Sampling for Approximate Answers to Aggregate Queries

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
Denormalization Effects on Performance of RDBMS

HICSS '01 Proceedings of the 34th Annual Hawaii International Conference on System Sciences ( HICSS-34)-Volume 3 - Volume 3
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient, Chunk-Replicated Node Partitioned Data Warehouses

ISPA '08 Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
Optimizing the data warehouse design by hierarchical denormalizing

ACS'08 Proceedings of the 8th conference on Applied computer scince
Constant-Time Query Processing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Double Index NEsted-Loop Reactive Join for Result Rate Optimization

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
ONE: a predictable and scalable DW model

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

OLAP analysis is a fundamental tool for enterprises in competitive markets. While known (planned) queries can be tuned to provide fast answers, ad-hoc queries have to process huge volumes of the base DW data and thus resulting in slower response times. While parallel architectures can provide improved performance, by using a divide-and-conquer approach, their structure is rigid and suffers from scalability limitations imposed by the star schema model used in most deployments. Therefore usually they are over-dimensioned with computational resources in order to provide fast response times. However, for most business decisions, it is more important to have guarantees that queries will be answered in a timely fashion. The star schema model physical representation introduces severe limitations to scalability and in the ability to provide timely execution, due to the well-known parallel join issue and the need to use solutions such as on-the fly repartitioning of data or intermediate results, or massive replication of large data sets that still need to be joined locally. In this paper, we propose PH-ONE an architecture that overcomes the scalability limitations by combining an elastic set of inexpensive heterogeneous nodes with a denormalized DW storage model organization, which requires a minimal set of predictable processing tasks, using in a shared-nothing scheme to remove costly joins. PH-ONE delivers timely execution guarantees by adjusting the number of processing nodes and by rebalancing the data load according to the nodes characteristics. We used the TPC-H benchmark to evaluate PH-ONE ability to provide timely results.