Performance analysis of "Groupby-After-Join" query processing in parallel database systems

  • Authors:
  • David Taniar;Rebecca Boon-Noi Tan;C. H. C. Leung;K. H. Liu

  • Affiliations:
  • School of Business Systems, Monash University, Clayton, Victoria 3800, Australia;School of Business Systems, Monash University, Clayton, Victoria 3800, Australia;School of Computer Science and Mathematics, Victoria University, P.O. Box 14428 MCMC, Melbourne 8001, Australia;Blueridge Systems, 2115 Aldrin Road, #12B, Ocean, NJ

  • Venue:
  • Information Sciences—Informatics and Computer Science: An International Journal
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Queries containing aggregate functions often combine multiple tables through join operations. This query is subsequently called "Groupby-Join". There is a special category of this query whereby the group-by operation can only be performed after the join operation. This is known as "Groupby-After-Join" queries--the focus of this paper. In parallel processing of such queries, it must be decided which attribute is used as a partitioning attribute, particularly join attribute or group-by attribute. Based on the partitioning attribute, two parallel processing methods, namely join partition method (JPM) and aggregate partition method (APM) are discussed. The behaviours of these parallelization methods are described in terms of cost models. Experiments are performed based on simulations. The simulation results show that the aggregate partition method performs better than the join partition method.