Automated design of multidimensional clustering tables for relational databases

Authors:
Sam S. Lightstone;Bishwaranjan Bhattacharjee
Affiliations:
IBM Toronto Laboratory, Markham, Ontario, Canada;IBM T. J. Watson Research Center, Hawthorne, New York
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 13
Cited 5

AutoAdmin “what-if” index analysis utility

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Microsoft index turning wizard for SQL Server 7.0

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automating physical database design in a parallel database

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Adaptive and Automated Index Selection in RDBMS

EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Multi-Dimensional Database Allocation for Parallel Data Warehouses

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
WARLOCK: A Data Allocation Tool for Parallel Warehouses

Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
DB2 Advisor: An Optimizer Smart Enough to Recommend its own Indexes

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Improving OLAP Performance by Multidimensional Hierarchical Clustering

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Multi-dimensional clustering: a new data layout scheme in DB2

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient query processing for multi-dimensionally clustered tables in DB2

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
DB2 design advisor: integrated automatic physical database design

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
On multidimensional data and modern disks

FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
DB2 design advisor: integrated automatic physical database design

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient bulk deletes for multi dimensional clustered tables in DB2

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A new tool for multi-level partitioning in teradata

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

The ability to physically cluster a database table on multiple dimensions is a powerful technique that offers significant performance benefits in many OLAP, warehousing, and decision-support systems. An industrial implementation of this technique for the DB2® Universal DatabaseTM (DB2 UDB) product, called multidimensional clustering (MDC), which co-exists with other classical forms of data storage and indexing methods, was described in VLDB 2003. This paper describes the first published model for automating the selection of clustering keys in single-dimensional and multidimensional relational databases that use a cell/block storage structure for MDC. For any significant dimensionality (3 or more), the possible solution space is combinatorially complex. The automated MDC design model is based on what-if query cost modeling, data sampling, and a search algorithm for evaluating a large constellation of possible combinations. The model is effective at trading the benefits of potential combinations of clustering keys against data sparsity and performance. It also effectively selects the granularity at which dimensions should be used for clustering (such as week of year versus month of year). We show results from experiments indicating that the model provides design recommendations of comparable quality to those made by human experts. The model has been implemented in the IBM® DB2 UDB for Linux®, UNIX® and Windows® Version 8.2 release.