Performance analysis of "Groupby-After-Join" query processing in parallel database systems
Information Sciences—Informatics and Computer Science: An International Journal
Hi-index | 0.00 |
SQL queries in the real world are replete with group-by and join operations. This type of queries is often known as "GroupBy-Join" queries. In some GroupBy-Join queries, it is desirable to perform group-by before join in order to achieve better performance. This subset of GroupBy-Join queries is called "GroupBy-Before-Join" queries. In this paper, we present a study on parallelization of GroupBy-Before-Join queries, particularly by exploiting cluster architectures. From our study, we have learned that in parallel query optimization, processing group-by as early as possible is not always desirable. In many occasions, performing data distribution first before group-by offers performance advantages. In this study, we also describe our cluster-based scheme.