Parallel Processing of "GroupBy-Before-Join" Queries in Cluster Architecture

  • Authors:
  • David Taniar;J. Wenny Rahayu

  • Affiliations:
  • -;-

  • Venue:
  • CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

SQL queries in the real world are replete with group-by and join operations. This type of queries is often known as "GroupBy-Join" queries. In some GroupBy-Join queries, it is desirable to perform group-by before join in order to achieve better performance. This subset of GroupBy-Join queries is called "GroupBy-Before-Join" queries. In this paper, we present a study on parallelization of GroupBy-Before-Join queries, particularly by exploiting cluster architectures. From our study, we have learned that in parallel query optimization, processing group-by as early as possible is not always desirable. In many occasions, performing data distribution first before group-by offers performance advantages. In this study, we also describe our cluster-based scheme.