Efficient bulk deletes for multi dimensional clustered tables in DB2

Authors:
Bishwaranjan Bhattacharjee;Timothy Malkemus;Sherman Lau;Sean McKeough;Jo-anne Kirton;Robin Von Boeschoten;John P Kennedy
Affiliations:
IBM T.J. Watson Research Center, Hawthorne, NY;IBM T.J. Watson Research Center, Hawthorne, NY;IBM Toronto Laboratories, Markham, Ontario, Canada;IBM Toronto Laboratories, Markham, Ontario, Canada;IBM Toronto Laboratories, Markham, Ontario, Canada;IBM Toronto Laboratories, Markham, Ontario, Canada;IBM Toronto Laboratories, Markham, Ontario, Canada
Venue:
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Year:
2007

Citing 12
Cited 2

B-trees with inserts and deletes: why free-at-empty is better than merge-at-half

PODS '89 Selected papers of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Implementing deletion in B+-trees

ACM SIGMOD Record
Towards effective and efficient free space management

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Differential files: their application to the maintenance of large databases

ACM Transactions on Database Systems (TODS)
Efficient Bulk Deletes in Relational Databases

Proceedings of the 17th International Conference on Data Engineering
A Generic Approach to Bulk Loading Multidimensional Index Structures

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Bulk Loading into an OODB: A Performance Study

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
OODB Bulk Loading Revisited: The Partitioned-List Approach

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficient Search of Multi-Dimensional B-Trees

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Multi-dimensional clustering: a new data layout scheme in DB2

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient query processing for multi-dimensionally clustered tables in DB2

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automated design of multidimensional clustering tables for relational databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Efficient index compression in DB2 LUW

Proceedings of the VLDB Endowment
Making updates disk-I/O friendly using SSDs

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data warehousing applications, the ability to efficiently delete large chunks of data from a table is very important. This feature is also known as Rollout or Bulk Deletes. Rollout is generally carried out periodically and is often done on more than one dimension or attribute. The ability to efficiently handle the updates of RID indexes while doing Rollouts is a well known problem for database engines and its solution is very important for data warehousing applications. DB2 UDB V8.1 introduced a new physical clustering scheme called Multi Dimensional Clustering (MDC) which allows users to cluster data in a table on multiple attributes or dimensions. This is very useful for query processing and maintenance activities including deletes. Subsequently, an enhancement was incorporated in DB2 UDB Viper 2 which allows for very efficient online rollout of data on dimensional boundaries even when there are a lot of secondary RID indexes defined on the table. This is done by the asynchronous updates of these RID indexes in the background while allowing the delete to commit and the table to be accessed. This paper details the design of MDC Rollout and the challenges that were encountered. It discusses some performance results which show order of magnitude improvements using it and the lessons learnt.