Performance guarantees for B-trees with different-sized atomic keys

  • Authors:
  • Michael A. Bender;Haodong Hu;Bradley C. Kuszmaul

  • Affiliations:
  • Stony Brook University, Stony Brook, NY, USA;Microsoft, Redmond, WA, USA;MIT, Cambridge, MA, USA

  • Venue:
  • Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most B-tree papers assume that all N keys have the same size K, that F = B/K keys fit in a disk block, and therefore that the search cost is O(logf+1 N) block transfers. When keys have variable size, however, B-tree operations have no nontrivial performance guarantees. This paper provides B-tree-like performance guarantees on dictionaries that contain keys of different sizes in a model in which keys must be stored and compared as opaque objects. The resulting atomic-key dictionaries exhibit performance bounds in terms of the average key size and match the bounds when all keys are the same size. Atomic key dictionaries can be built with minimal modification to the B-tree structure, simply by choosing the pivot keys properly. This paper describes both static and dynamic atomic-key dictionaries. In the static case, if there are N keys with average size K, the search cost is O(⌈K/B⌉ log1+⌈K/B⌉ N) expected transfers. The paper proves that it is not possible to transform these expected bounds into worst-case bounds. The cost to build the tree is O(NK) operations and O(NK/B) transfers if all keys are presented in sorted order. If not, the cost is the sorting cost. For the dynamic dictionaries, the amortized cost to insert a key κ of arbitrary length at an arbitrary rank is dominated by the cost to search for κ. Specifically the amortized cost to insert a key κ of arbitrary length and random rank is O(⌈K/B⌉ log1+⌈K/B⌉ N + |κ| /B) transfers. A dynamic-programming algorithm is shown for constructing a search tree with minimal expected cost.