The limits of buffering: a tight lower bound for dynamic membership in the external memory model

Authors:
Elad Verbin;Qin Zhang
Affiliations:
ITCS, Tsinghua University, Beijing, China;HKUST, Hong Kong, China
Venue:
Proceedings of the forty-second ACM symposium on Theory of computing
Year:
2010

Citing 25
Cited 4

The input/output complexity of sorting and related problems

Communications of the ACM
A lower bound for the dictionary problem under a hashing model

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Dynamic Perfect Hashing: Upper and Lower Bounds

SIAM Journal on Computing
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
On data structures and asymmetric communication complexity

Journal of Computer and System Sciences
Heaps and heapsort on secondary storage

Theoretical Computer Science
Performance analysis of linear hashing with partial expansions

ACM Transactions on Database Systems (TODS)
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Should Tables Be Sorted?

Journal of the ACM (JACM)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
On the cell probe complexity of membership and perfect hashing

STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Cache-oblivious priority queue and graph algorithm applications

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Optimal bounds for the predecessor problem and related problems

Journal of Computer and System Sciences - STOC 1999
Lower bounds for external memory dictionaries

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Tight bounds for the partial-sums problem

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Cuckoo hashing

Journal of Algorithms
On dynamic range reporting in one dimension

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Time-space trade-offs for predecessor search

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Linear hashing: a new tool for file and table addressing

VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
On the complexity of a game related to the dictionary problem

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Optimality in External Memory Hashing

Algorithmica
Algorithms and Data Structures for External Memory

Algorithms and Data Structures for External Memory
Cell probe lower bounds for succinct data structures

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Dynamic external hashing: the limit of buffering

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
On the cell probe complexity of dynamic membership

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms

Cache-oblivious hashing

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using hashing to solve the dictionary problem

Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
External-memory multimaps

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Indexability of 2D range search revisited: constant redundancy and weak indivisibility

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any other constant b log n(n/m)), where n is the number of elements in the dictionary. In contrast, when the update time is allowed to be 1 + o(1), then a bit vector or hash table give query time O(1). Thus, this is a threshold phenomenon for data structures. This lower bound answers a folklore conjecture of the external memory community. Since almost any data structure task can solve membership, our lower bound implies a dichotomy between two alternatives: (i) make the amortized update time at least 1 (so the data structure does not buffer, and we lose one of the main potential advantages of the cache), or (ii) make the query time at least roughly logarithmic in n. Our result holds even when the updates and queries are chosen uniformly at random and there are no deletions; it holds for randomized data structures, holds when the universe size is O(n), and does not make any restrictive assumptions such as indivisibility. All of the lower bounds we prove hold regardless of the space consumption of the data structure, while the upper bounds only need linear space. The lower bound has some striking implications for external memory data structures. It shows that the query complexities of many problems such as 1D-range counting, predecessor, rank-select, and many others, are all the same in the regime where the amortized update time is less than 1, as long as the cell size is large enough (b = polylog(n) suffices). The proof of our lower bound is based on a new combinatorial lemma called the Lemma of Surprising Intersections (LOSI) which allows us to use a proof methodology where we first analyze the intersection structure of the positive queries by using encoding arguments, and then use statistical arguments to deduce properties of the intersection structure of all queries, even the negative ones. In most other data structure arguments that we know, it is difficult to argue anything about the negative queries. Therefore we believe that the LOSI and this proof methodology might find future uses for other problems.