Write-optimized B-trees

  • Authors:
  • Goetz Graefe

  • Affiliations:
  • Microsoft

  • Venue:
  • VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large writes are beneficial both on individual disks and on disk arrays, e.g., RAID-5. The presented design enables large writes of internal B-tree nodes and leaves. It supports both in-place updates and large append-only ("log-structured") write operations within the same storage volume, within the same B-tree, and even at the same time. The essence of the proposal is to make page migration inexpensive, to migrate pages while writing them, and to make such migration optional rather than mandatory as in log-structured file systems. The inexpensive page migration also aids traditional defragmentation as well as consolidation of free space needed for future large writes. These advantages are achieved with a very limited modification to conventional B-trees that also simplifies other B-tree operations, e.g., key range locking and compression. Prior proposals and prototypes implemented transacted B-tree on top of log-structured file systems and added transaction support to log-structured file systems. Instead, the presented design adds techniques and performance characteristics of log-structured file systems to traditional B-trees and their standard transaction support, notably without adding a layer of indirection for locating B-tree nodes on disk. The result retains fine-granularity locking, full transactional ACID guarantees, fast search performance, etc. expected of a modern B-tree implementation, yet adds efficient transacted page relocation and large, high-bandwidth writes.