A one-pass aggregation algorithm with the optimal buffer size in multidimensional OLAP

Authors:
Young-Koo Lee;Kyu-Young Whang;Yang-Sae Moon;Il-Yeol Song
Affiliations:
Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea;Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea;Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Taejon, Korea;College of Information Science and Technology, Drexel University, Philadelphia, Pennsylvania
Venue:
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Year:
2002

Citing 22
Cited 1

Principles of database buffer management

ACM Transactions on Database Systems (TODS)
Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A linear-time probabilistic counting algorithm for database applications

ACM Transactions on Database Systems (TODS)
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
The LRU-K page replacement algorithm for database disk buffering

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An alternative storage organization for ROLAP aggregate views based on cubetrees

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
A survey of logical models for OLAP databases

ACM SIGMOD Record
The Quadtree and Related Hierarchical Data Structures

ACM Computing Surveys (CSUR)
Operating Systems Theory

Operating Systems Theory
Database System Concepts

Database System Concepts
Dynamic maintenance of data distribution for selectivity estimation

The VLDB Journal — The International Journal on Very Large Data Bases
High Dimensional Similarity Joins: Algorithms and Performance Evaluation

IEEE Transactions on Knowledge and Data Engineering
A Region Splitting Strategy for Physical Database Design of Multidimensional File Organizations

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Aggregation Algorithms for Very Large Compressed Data Warehouses

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
An aggregation algorithm using a multidimensional file in multidimensional OLAP

Information Sciences: an International Journal

Dynamic construction of user defined virtual cubes

NGITS'06 Proceedings of the 6th international conference on Next Generation Information Technologies and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aggregation is an operation that plays a key role inmultidimensional OLAP (MOLAP). Existing aggregation methods inMOLAP have been proposed for file structures such asmultidimensional arrays. These file structures are suitable fordata with uniform distributions, but do not work well with skeweddistributions. In this paper, we consider an aggregation methodthat uses dynamic multidimensional files adapting to skeweddistributions. In these multidimensional files, the sizes of pageregions vary according to the data density in these regions, andthe pages that belong to a larger region are accessed multipletimes while computing aggregations. To solve this problem, we firstpresent an aggregation computation model, called theDisjoint-Inclusive Partition (DIP) computation model, that is theformal basis of our approach. Based on this model, we then presentthe one-pass aggregation algorithm. This algorithm computesaggregations using the one-pass buffer size, which is the minimumbuffer size required for guaranteeing one disk access per page. Weprove that our aggregation algorithm is optimal with respect to theone-pass buffer size under our aggregation computation model. Usingthe DIP computation model allows us to correctly predict the orderof accessing data pages in advance. Thus, our algorithm achievesthe optimal one-pass buffer size by using a buffer replacementpolicy, such as Belady's B0 or Toss-Immediate policies,that exploits the page access order computed in advance. Since thepage access order is not known a priori in general, these policieshave been known to lack practicality despite its theoreticsignificance. Nevertheless, in this paper, we show that thesepolicies can be effectively used for aggregation computation.We have conducted extensive experiments. We first demonstratethat the one-pass buffer size theoretically derived is indeedcorrect in real environments. We then compare the performance ofthe one-pass algorithm with those of other ones. Experimentalresults for a real data set show that the one-pass algorithmreduces the number of disk accesses by up to 7.31 times comparedwith a naive algorithm. We also show that the memory requirement ofour algorithm for processing the aggregation in one-pass is verysmall being 0.05%|0.6% of the size of the database. These resultsindicate that our algorithm is practically usable even for a fairlylarge database. We believe our work provides an excellent formalbasis for investigating further issues in computing aggregations inMOLAP.