Star join revisited: Performance internals for cluster architectures

Authors:
Josep Aguilar-Saborit;Victor Muntés-Mulero;Calisto Zuzarte;Josep-L. Larriba-Pey
Affiliations:
IBM Toronto Laboratory, 8200 Warden Avenue, Markham, ON, Canada L6G1C7;Universitat Politècnica de Catalunya, DAMA-UPC and Computer Architecture Department, Jordi Girona 1-3, Campus Nord-UPC, Modul D6, E-08034 Barcelona, Spain;IBM Toronto Laboratory, 8200 Warden Avenue, Markham, ON, Canada L6G1C7;Universitat Politècnica de Catalunya, DAMA-UPC and Computer Architecture Department, Jordi Girona 1-3, Campus Nord-UPC, Modul D6, E-08034 Barcelona, Spain
Venue:
Data & Knowledge Engineering
Year:
2007

Citing 19
Cited 1

Parallel database systems: the future of high performance database systems

Communications of the ACM
Multi-table joins through bitmapped join indices

ACM SIGMOD Record
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data warehousing and OLAP for decision support

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A performance evaluation of cluster architectures

SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Materialized views and data warehouses

ACM SIGMOD Record
Caching multidimensional queries using chunks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bitmap index design and evaluation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Implications of certain assumptions in database performance evauation

ACM Transactions on Database Systems (TODS)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Parallel Star Join + DataIndexes: Efficient Query Processing in Data Warehouses and OLAP

IEEE Transactions on Knowledge and Data Engineering
Hash-Based Join Algorithms for Multiprocessor Computers

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
The Universal B-Tree for Multidimensional Indexing: general Concepts

WWCA '97 Proceedings of the International Conference on Worldwide Computing and Its Applications
Improving OLAP Performance by Multidimensional Hierarchical Clustering

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Multi-dimensional clustering: a new data layout scheme in DB2

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing star queries on hierarchically-clustered fact tables

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient query processing for multi-dimensionally clustered tables in DB2

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Ad hoc star join query processing in cluster architectures

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Scatter-Gather-Merge: An efficient star-join query processing algorithm for data-parallel frameworks

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data warehouse workloads are crucial for the support of on-line analytical processing (OLAP). The strategy to cope with OLAP queries on such huge amounts of data calls for the use of large parallel computers. The trend today is to use cluster architectures that show a reasonable balance between cost and performance. In such cases, it is necessary to tune the applications in order to minimize the amount of I/O and communication, such that the global execution time is reduced as much as possible. In this paper, we model and analyze the most up-to-date strategies for ad hoc star join query processing in a cluster of computers. We show that, for ad hoc query processing and assuming a limited amount of resources available, these strategies still have room for improvement both in terms of I/O and inter-node data traffic communication. Our analysis concludes with the proposal of a hybrid solution that improves these two aspects compared to the previous techniques, and shows near optimal results in a broad spectrum of cases.