A one-pass aggregation algorithm with the optimal buffer size in multidimensional OLAP

  • Authors:
  • Young-Koo Lee;Kyu-Young Whang;Yang-Sae Moon;Il-Yeol Song

  • Affiliations:
  • Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea;Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea;Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea;College of Information Science and Technology, Drexel University, Philadelphia, Pennsylvania

  • Venue:
  • VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Aggregation is an operation that plays a key role inmultidimensional OLAP (MOLAP). Existing aggregation methods inMOLAP have been proposed for file structures such asmultidimensional arrays. These file structures are suitable fordata with uniform distributions, but do not work well with skeweddistributions. In this paper, we consider an aggregation methodthat uses dynamic multidimensional files adapting to skeweddistributions. In these multidimensional files, the sizes of pageregions vary according to the data density in these regions, andthe pages that belong to a larger region are accessed multipletimes while computing aggregations. To solve this problem, we firstpresent an aggregation computation model, called theDisjoint-Inclusive Partition (DIP) computation model, that is theformal basis of our approach. Based on this model, we then presentthe one-pass aggregation algorithm. This algorithm computesaggregations using the one-pass buffer size, which is the minimumbuffer size required for guaranteeing one disk access per page. Weprove that our aggregation algorithm is optimal with respect to theone-pass buffer size under our aggregation computation model. Usingthe DIP computation model allows us to correctly predict the orderof accessing data pages in advance. Thus, our algorithm achievesthe optimal one-pass buffer size by using a buffer replacementpolicy, such as Belady's B0 or Toss-Immediate policies,that exploits the page access order computed in advance. Since thepage access order is not known a priori in general, these policieshave been known to lack practicality despite its theoreticsignificance. Nevertheless, in this paper, we show that thesepolicies can be effectively used for aggregation computation.We have conducted extensive experiments. We first demonstratethat the one-pass buffer size theoretically derived is indeedcorrect in real environments. We then compare the performance ofthe one-pass algorithm with those of other ones. Experimentalresults for a real data set show that the one-pass algorithmreduces the number of disk accesses by up to 7.31 times comparedwith a naive algorithm. We also show that the memory requirement ofour algorithm for processing the aggregation in one-pass is verysmall being 0.05%|0.6% of the size of the database. These resultsindicate that our algorithm is practically usable even for a fairlylarge database. We believe our work provides an excellent formalbasis for investigating further issues in computing aggregations inMOLAP.