B-trees with inserts and deletes: why free-at-empty is better than merge-at-half
PODS '89 Selected papers of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Implementing deletion in B+-trees
ACM SIGMOD Record
Towards effective and efficient free space management
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Differential files: their application to the maintenance of large databases
ACM Transactions on Database Systems (TODS)
Efficient Bulk Deletes in Relational Databases
Proceedings of the 17th International Conference on Data Engineering
A Generic Approach to Bulk Loading Multidimensional Index Structures
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Bulk Loading into an OODB: A Performance Study
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
OODB Bulk Loading Revisited: The Partitioned-List Approach
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficient Search of Multi-Dimensional B-Trees
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Multi-dimensional clustering: a new data layout scheme in DB2
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient query processing for multi-dimensionally clustered tables in DB2
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Automated design of multidimensional clustering tables for relational databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient index compression in DB2 LUW
Proceedings of the VLDB Endowment
Making updates disk-I/O friendly using SSDs
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In data warehousing applications, the ability to efficiently delete large chunks of data from a table is very important. This feature is also known as Rollout or Bulk Deletes. Rollout is generally carried out periodically and is often done on more than one dimension or attribute. The ability to efficiently handle the updates of RID indexes while doing Rollouts is a well known problem for database engines and its solution is very important for data warehousing applications. DB2 UDB V8.1 introduced a new physical clustering scheme called Multi Dimensional Clustering (MDC) which allows users to cluster data in a table on multiple attributes or dimensions. This is very useful for query processing and maintenance activities including deletes. Subsequently, an enhancement was incorporated in DB2 UDB Viper 2 which allows for very efficient online rollout of data on dimensional boundaries even when there are a lot of secondary RID indexes defined on the table. This is done by the asynchronous updates of these RID indexes in the background while allowing the delete to commit and the table to be accessed. This paper details the design of MDC Rollout and the challenges that were encountered. It discusses some performance results which show order of magnitude improvements using it and the lessons learnt.