Multi-dimensional clustering: a new data layout scheme in DB2

  • Authors:
  • Sriram Padmanabhan;Bishwaranjan Bhattacharjee;Tim Malkemus;Leslie Cranston;Matthew Huras

  • Affiliations:
  • IBM T.J. Watson Research Center, Hawthorne, New York;IBM T.J. Watson Research Center, Hawthorne, New York;IBM T.J. Watson Research Center, Hawthorne, New York;IBM Toronto Laboratory, Markham, Ontario, Canada;IBM Toronto Laboratory, Markham, Ontario, Canada

  • Venue:
  • Proceedings of the 2003 ACM SIGMOD international conference on Management of data
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the design and implementation of a new data layout scheme, called multi-dimensional clustering, in DB2 Universal Database Version 8. Many applications, e.g., OLAP and data warehousing, process a table or tables in a database using a multi-dimensional access paradigm. Currently, most database systems can only support organization of a table using a primary clustering index. Secondary indexes are created to access the tables when the primary key index is not applicable. Unfortunately, secondary indexes perform many random I/O accesses against the table for a simple operation such as a range query. Our work in multi-dimensional clustering addresses this important deficiency in database systems. Multi-Dimensional Clustering is based on the definition of one or more orthogonal clustering attributes (or expressions) of a table. The table is organized physically by associating records with similar values for the dimension attributes in a cluster. We describe novel techniques for maintaining this physical layout efficiently and methods of processing database operations that provide significant performance improvements. We show results from experiments using a star-schema database to validate our claims of performance with minimal overhead.