An efficient algorithm for computing range-groupby queries

  • Authors:
  • Young-Koo Lee;Woong-Kee Loh;Yang-Sae Moon;Kyu-Young Whang;Il-Yeol Song

  • Affiliations:
  • Department of Computer Science &, Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea;Department of Computer Science &, Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea;Department of Computer Science, Kangwon National University, Chunchon, Kangwon, Korea;Department of Computer Science &, Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea;College of Information Science and Technology, Drexel University, Philadelphia, Pennsylvania

  • Venue:
  • DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Aggregation queries for arbitrary regions in an n-dimensional space are powerful tools for data analysis in OLAP. A GROUP BY query in OLAP is very important since it allows us to summarize various trends along with any combination of dimensions. In this paper, we extend the previous aggregation queries by including the GROUP BY clause for arbitrary regions. We call the extension range-groupby queries and present an efficient algorithm for processing them. A typical method of achieving fast response time for aggregation queries is using the prefix-sum array, which stores precomputed partial aggregation values. A naive method for range-groupby queries maintains a prefix-sum array for each combination of the grouping dimensions in an n-dimensional cube, which incurs enormous storage overhead. Our algorithm maintains only one prefix-sum array and still effectively processes range-groupby queries for all possible combinations of multiple grouping dimensions. Compared with the naive method, our algorithm reduces the space overhead by $O(\frac{1}{2^n})$, while accessing almost the identical number of cells.