Principles of database buffer management
ACM Transactions on Database Systems (TODS)
Equi-depth multidimensional histograms
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A linear-time probabilistic counting algorithm for database applications
ACM Transactions on Database Systems (TODS)
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
The LRU-K page replacement algorithm for database disk buffering
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology
ACM SIGMOD Record
An array-based algorithm for simultaneous multidimensional aggregates
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An alternative storage organization for ROLAP aggregate views based on cubetrees
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods
ACM Computing Surveys (CSUR)
The Grid File: An Adaptable, Symmetric Multikey File Structure
ACM Transactions on Database Systems (TODS)
A survey of logical models for OLAP databases
ACM SIGMOD Record
The Quadtree and Related Hierarchical Data Structures
ACM Computing Surveys (CSUR)
Operating Systems Theory
Database System Concepts
Dynamic maintenance of data distribution for selectivity estimation
The VLDB Journal — The International Journal on Very Large Data Bases
High Dimensional Similarity Joins: Algorithms and Performance Evaluation
IEEE Transactions on Knowledge and Data Engineering
A Region Splitting Strategy for Physical Database Design of Multidimensional File Organizations
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Aggregation Algorithms for Very Large Compressed Data Warehouses
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Buddy-Tree: An Efficient and Robust Access Method for Spatial Data Base Systems
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
An aggregation algorithm using a multidimensional file in multidimensional OLAP
Information Sciences: an International Journal
Dynamic construction of user defined virtual cubes
NGITS'06 Proceedings of the 6th international conference on Next Generation Information Technologies and Systems
Hi-index | 0.00 |
Aggregation is an operation that plays a key role inmultidimensional OLAP (MOLAP). Existing aggregation methods inMOLAP have been proposed for file structures such asmultidimensional arrays. These file structures are suitable fordata with uniform distributions, but do not work well with skeweddistributions. In this paper, we consider an aggregation methodthat uses dynamic multidimensional files adapting to skeweddistributions. In these multidimensional files, the sizes of pageregions vary according to the data density in these regions, andthe pages that belong to a larger region are accessed multipletimes while computing aggregations. To solve this problem, we firstpresent an aggregation computation model, called theDisjoint-Inclusive Partition (DIP) computation model, that is theformal basis of our approach. Based on this model, we then presentthe one-pass aggregation algorithm. This algorithm computesaggregations using the one-pass buffer size, which is the minimumbuffer size required for guaranteeing one disk access per page. Weprove that our aggregation algorithm is optimal with respect to theone-pass buffer size under our aggregation computation model. Usingthe DIP computation model allows us to correctly predict the orderof accessing data pages in advance. Thus, our algorithm achievesthe optimal one-pass buffer size by using a buffer replacementpolicy, such as Belady's B0 or Toss-Immediate policies,that exploits the page access order computed in advance. Since thepage access order is not known a priori in general, these policieshave been known to lack practicality despite its theoreticsignificance. Nevertheless, in this paper, we show that thesepolicies can be effectively used for aggregation computation.We have conducted extensive experiments. We first demonstratethat the one-pass buffer size theoretically derived is indeedcorrect in real environments. We then compare the performance ofthe one-pass algorithm with those of other ones. Experimentalresults for a real data set show that the one-pass algorithmreduces the number of disk accesses by up to 7.31 times comparedwith a naive algorithm. We also show that the memory requirement ofour algorithm for processing the aggregation in one-pass is verysmall being 0.05%|0.6% of the size of the database. These resultsindicate that our algorithm is practically usable even for a fairlylarge database. We believe our work provides an excellent formalbasis for investigating further issues in computing aggregations inMOLAP.