Linear clustering of objects with multiple attributes
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
OLAP and statistical databases: similarities and differences
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Optimal partial-match retrieval when fields are independently specified
ACM Transactions on Database Systems (TODS)
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals
Data Mining and Knowledge Discovery
Efficient Organization of Large Multidimensional Arrays
Proceedings of the Tenth International Conference on Data Engineering
Physical Schemas for Large Multidimensional Arrays in Scientific Computing Applications
Proceedings of the Seventh International Working Conference on Scientific and Statistical Database Management
Extendible Arrays for Statistical Databases and OLAP Applications
SSDBM '96 Proceedings of the Eighth International Conference on Scientific and Statistical Database Management
SISYPHUS: the implementation of a chunk-based storage manager for OLAP data cubes
Data & Knowledge Engineering - Special issue: Advances in OLAP
Convex Optimization
Efficient Storage Allocation of Large-Scale Extendible Multi-dimensional Scientific Datasets
SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
ArrayStore: a storage manager for complex parallel array processing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Normalised LCS-based method for indexing multidimensional data cube
International Journal of Intelligent Information and Database Systems
Hi-index | 0.00 |
Very large multidimensional arrays are commonly used in data intensive scientific computations as well ason-line analytical processing applications referred to as MOLAP. The storage organization of such arrays on disks is done by partitioning the large global array into fixed size sub-arrays called chunks or tiles that form the units of data transfer between disk and memory. Typical queries involve the retrieval of sub-arrays in a manner that access all chunks that overlap the query results. An important metric of the storage efficiency is the expected number of chunks retrieved over all such queries. The question that immediately arises is "what shapes of array chunks give theminimum expected number of chunks over a query workload?" The problem of optimal chunking was first introduced by Sarawagi and Stonebraker [11] who gave an approximate solution. In this paper we develop exact mathematical models of the problem and provide exact solutions using steepest descent and geometric programming methods. Experimental results, using synthetic and real life workloads, show that our solutions are consistently within than 2.0% of the true number of chunks retrieved for any number of dimensions. In contrast, the approximate solution of [11] can deviate considerably from the true result with increasing number of dimensions and also may lead suboptimal chunk shapes.