An aggregation algorithm using a multidimensional file in multidimensional OLAP

Authors:
Young-Koo Lee;Kyu-Young Whang;Yang-Sae Moon;Il-Yeol Song
Affiliations:
Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), 373-1, Kusong-Dong Yusong-Gu, Taejon 305-701 ...;Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), 373-1, Kusong-Dong Yusong-Gu, Taejon 305-701 ...;Department of Computer Science and Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), 373-1, Kusong-Dong Yusong-Gu, Taejon 305-701 ...;College of Information Science and Technology, Drexel University, Philadelphia, PA
Venue:
Information Sciences: an International Journal
Year:
2003

Citing 16
Cited 2

Principles of database buffer management

ACM Transactions on Database Systems (TODS)
Equi-depth multidimensional histograms

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The buddy tree: an efficient and robust access method for spatial data base

Proceedings of the sixteenth international conference on Very large databases
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An alternative storage organization for ROLAP aggregate views based on cubetrees

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
A survey of logical models for OLAP databases

ACM SIGMOD Record
The Quadtree and Related Hierarchical Data Structures

ACM Computing Surveys (CSUR)
Dynamic maintenance of data distribution for selectivity estimation

The VLDB Journal — The International Journal on Very Large Data Bases
A Region Splitting Strategy for Physical Database Design of Multidimensional File Organizations

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Aggregation Algorithms for Very Large Compressed Data Warehouses

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

A one-pass aggregation algorithm with the optimal buffer size in multidimensional OLAP

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Business intelligence for small and middle-sized entreprises

ACM SIGMOD Record

Quantified Score

Hi-index	0.07

Visualization

Abstract

Aggregation is an operation that plays a key role in multidimensional OLAP (MOLAP). Existing aggregation methods in MOLAP have been proposed for file structures such as multidimensional arrays. These file structures are suitable for data with uniform distributions, but do not work well with skewed distributions. In this paper, we consider an aggregation method that uses dynamic multidimensional files adapting to skewed distributions. In these multidimensional files, the sizes of page regions vary according to the data density in these regions, and the pages that belong to a larger region are accessed multiple times while computing aggregations. To solve this problem, we first present an aggregation computation model that uses the new notions of disjoint-inclusive partition and induced space filling curves . Based on this model, we then present a dynamic aggregation algorithm. Using these notions, the algorithm allows us to maximize the effectiveness of the buffer--we control the page access order in such a way that a page being accessed can reside in the buffer until the next access. We have conducted experiments to show the effectiveness of our approach. Experimental results for a real data set show that the algorithm reduces the number of disk accesses by up to 5.09 times compared with a naive algorithm. The results further show that the algorithm achieves a near optimal performance (i.e., normalized I/O = 1.01) with the total main memory (needed for the buffer and the result table) less than 1.0% of the database size. We believe our work also provides an excellent formal basis for investigating further issues in computing aggregations in MOLAP.