Multi-dimensional clustering: a new data layout scheme in DB2

Authors:
Sriram Padmanabhan;Bishwaranjan Bhattacharjee;Tim Malkemus;Leslie Cranston;Matthew Huras
Affiliations:
IBM T.J. Watson Research Center, Hawthorne, New York;IBM T.J. Watson Research Center, Hawthorne, New York;IBM T.J. Watson Research Center, Hawthorne, New York;IBM Toronto Laboratory, Markham, Ontario, Canada;IBM Toronto Laboratory, Markham, Ontario, Canada
Venue:
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Year:
2003

Citing 2
Cited 21

The data warehouse toolkit: practical techniques for building dimensional data warehouses

The data warehouse toolkit: practical techniques for building dimensional data warehouses
An overview of data warehousing and OLAP technology

ACM SIGMOD Record

Predicate Derivation and Monotonicity Detection in DB2 UDB

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
CISS: An efficient object clustering framework for DHT-based peer-to-peer applications

Computer Networks: The International Journal of Computer and Telecommunications Networking
On synopses for distinct-value estimation under multiset operations

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Star join revisited: Performance internals for cluster architectures

Data & Knowledge Engineering
Efficient query processing for multi-dimensionally clustered tables in DB2

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
DB2 design advisor: integrated automatic physical database design

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Automated design of multidimensional clustering tables for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient bulk deletes for multi dimensional clustered tables in DB2

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Increasing buffer-locality for multiple index based scans through intelligent placement and index scan speed control

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Hierarchical clustering for OLAP: the CUBE File approach

The VLDB Journal — The International Journal on Very Large Data Bases
Architecture of a Database System

Foundations and Trends in Databases
Efficient index compression in DB2 LUW

Proceedings of the VLDB Endowment
An optimal algorithm for the distinct elements problem

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Positional update handling in column stores

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Flashing databases: expectations and limitations

Proceedings of the Sixth International Workshop on Data Management on New Hardware
CCIndex: a complemental clustering index on distributed ordered tables for multi-dimensional range queries

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
SSD bufferpool extensions for database systems

Proceedings of the VLDB Endowment
Column-oriented query processing for row stores

Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP
Ameliorating memory contention of OLAP operators on GPU processors

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Normalised LCS-based method for indexing multidimensional data cube

International Journal of Intelligent Information and Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the design and implementation of a new data layout scheme, called multi-dimensional clustering, in DB2 Universal Database Version 8. Many applications, e.g., OLAP and data warehousing, process a table or tables in a database using a multi-dimensional access paradigm. Currently, most database systems can only support organization of a table using a primary clustering index. Secondary indexes are created to access the tables when the primary key index is not applicable. Unfortunately, secondary indexes perform many random I/O accesses against the table for a simple operation such as a range query. Our work in multi-dimensional clustering addresses this important deficiency in database systems. Multi-Dimensional Clustering is based on the definition of one or more orthogonal clustering attributes (or expressions) of a table. The table is organized physically by associating records with similar values for the dimension attributes in a cluster. We describe novel techniques for maintaining this physical layout efficiently and methods of processing database operations that provide significant performance improvements. We show results from experiments using a star-schema database to validate our claims of performance with minimal overhead.