Performance analysis of "Groupby-After-Join" query processing in parallel database systems

Authors:
David Taniar;Rebecca Boon-Noi Tan;C. H. C. Leung;K. H. Liu
Affiliations:
School of Business Systems, Monash University, Clayton, Victoria 3800, Australia;School of Business Systems, Monash University, Clayton, Victoria 3800, Australia;School of Computer Science and Mathematics, Victoria University, P.O. Box 14428 MCMC, Melbourne 8001, Australia;Blueridge Systems, 2115 Aldrin Road, #12B, Ocean, NJ
Venue:
Information Sciences—Informatics and Computer Science: An International Journal
Year:
2004

Citing 13
Cited 7

Join processing in relational databases

ACM Computing Surveys (CSUR)
Parallel database systems: the future of high performance database systems

Communications of the ACM
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
A high-performance parallel database architecture

ICS '93 Proceedings of the 7th international conference on Supercomputing
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
On optimizing an SQL-like nested query

ACM Transactions on Database Systems (TODS)
Performing Group-By before Join

Proceedings of the Tenth International Conference on Data Engineering
Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Translating and Optimizing SQL Queries Having Aggregates

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Outstanding Challenges in OLAP

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
A Case for Parallelism in Data Warehousing and OLAP

DEXA '98 Proceedings of the 9th International Workshop on Database and Expert Systems Applications
Parallel Processing of "GroupBy-Before-Join" Queries in Cluster Architecture

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Parallel Processing of Multi-Join Expansion_aggregate Data Cube Query in High Performance Database Systems

ISPAN '02 Proceedings of the 2002 International Symposium on Parallel Architectures, Algorithms and Networks

The use of Hints in SQL-Nested query optimization

Information Sciences: an International Journal
Benchmarking data warehouses

International Journal of Business Intelligence and Data Mining
Estimating software readiness using predictive models

Information Sciences: an International Journal
Beyond pages: supporting efficient, scalable entity search with dual-inversion index

Proceedings of the 13th International Conference on Extending Database Technology
Voronoi-based range and continuous range query processing in mobile databases

Journal of Computer and System Sciences
Data warehouse design on the basis of Hierarchical Degenerate Snowflake (HDS)

International Journal of Business Intelligence and Data Mining
Self-adaptive approximate queries for large-scale information aggregation

International Journal of Web and Grid Services

Quantified Score

Hi-index	0.00

Visualization

Abstract

Queries containing aggregate functions often combine multiple tables through join operations. This query is subsequently called "Groupby-Join". There is a special category of this query whereby the group-by operation can only be performed after the join operation. This is known as "Groupby-After-Join" queries--the focus of this paper. In parallel processing of such queries, it must be decided which attribute is used as a partitioning attribute, particularly join attribute or group-by attribute. Based on the partitioning attribute, two parallel processing methods, namely join partition method (JPM) and aggregate partition method (APM) are discussed. The behaviours of these parallelization methods are described in terms of cost models. Experiments are performed based on simulations. The simulation results show that the aggregate partition method performs better than the join partition method.