Dynamic Indexability and the Optimality of B-Trees

Authors:
Ke Yi
Affiliations:
Hong Kong University of Science and Technology
Venue:
Journal of the ACM (JACM)
Year:
2012

Citing 19
Cited 0

The input/output complexity of sorting and related problems

Communications of the ACM
The log-structured merge-tree (LSM-tree)

Acta Informatica
On the analysis of indexing schemes

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A lower bound theorem for indexing schemes and its application to multidimensional range queries

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Tight bounds for 2-dimensional indexing schemes

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On two-dimensional indexability and optimal range search indexing

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Should Tables Be Sorted?

Journal of the ACM (JACM)
On external memory graph traversal

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Optimal static range reporting in one dimension

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
On a model of indexability and its bounds for range queries

Journal of the ACM (JACM)
Optimal bounds for the predecessor problem and related problems

Journal of Computer and System Sciences - STOC 1999
Lower bounds for external memory dictionaries

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Incremental Organization for Data Recording and Warehousing

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Novel Index Supporting High Volume Data Warehouse Insertion

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On dynamic range reporting in one dimension

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Dynamic ordered sets with exponential search trees

Journal of the ACM (JACM)
Cache-oblivious streaming B-trees

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Optimal External Memory Planar Point Enclosure

Algorithmica
Using hashing to solve the dictionary problem

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

One-dimensional range queries, as one of the most basic type of queries in databases, have been studied extensively in the literature. For large databases, the goal is to build an external index that is optimized for disk block accesses (or I/Os). The problem is well understood in the static case. Theoretically, there exists an index of linear size that can answer a range query in O(1 + KB) I/Os, where K is the output size and B is the disk block size, but it is highly impractical. In practice, the standard solution is the B-tree, which answers a query in O(logB NM + KB) I/Os on a data set of size N, where M is the main memory size. For typical values of N, M, and B, logB NM can be considered a constant. However, the problem is still wide open in the dynamic setting, when insertions and deletions of records are to be supported. With smart buffering, it is possible to speed up updates significantly to o(1) I/Os amortized. Indeed, several dynamic B-trees have been proposed, but they all cause certain levels of degradation in the query performance, with the most interesting tradeoff point at O(1B log NM) I/Os for updates and O(log NM + KB) I/Os for queries. In this article, we prove that the query-update tradeoffs of all the known dynamic B-trees are optimal, when logB NM is a constant. This implies that one should not hope for substantially better solutions for all practical values of the parameters. Our lower bounds hold in a dynamic version of the indexability model, which is of independent interests. Dynamic indexability is a clean yet powerful model for studying dynamic indexing problems, and can potentially lead to more interesting lower bound results.