Utilization of B-trees with inserts, deletes and modifies

  • Authors:
  • T. Johnson;D. Shasha

  • Affiliations:
  • Courant Institute of Mathematical Sciences, New York University;Courant Institute of Mathematical Sciences, New York University

  • Venue:
  • PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

The utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts was 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from N inserts followed by M modifies (where M N), where each modify is a delete followed by an insert. Assuming that nodes only merge when they are empty (the technique used in most database management systems), we show that the utilization is 39% as M becomes large. We extend this model to a parameterized mixture of inserts and modifies. Surprisingly, if the modifies are mixed with just 10% inserts, then the utilization is over 62%. We also calculated the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We present two models for computing this utilization, the more accurate of which remembers items inserted and then deleted in a node - we call such items ghosts.