The input/output complexity of sorting and related problems
Communications of the ACM
A lower bound for the dictionary problem under a hashing model
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Dynamic Perfect Hashing: Upper and Lower Bounds
SIAM Journal on Computing
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
On data structures and asymmetric communication complexity
Journal of Computer and System Sciences
Heaps and heapsort on secondary storage
Theoretical Computer Science
Performance analysis of linear hashing with partial expansions
ACM Transactions on Database Systems (TODS)
Extendible hashing—a fast access method for dynamic files
ACM Transactions on Database Systems (TODS)
Journal of the ACM (JACM)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
On the cell probe complexity of membership and perfect hashing
STOC '01 Proceedings of the thirty-third annual ACM symposium on Theory of computing
Cache-oblivious priority queue and graph algorithm applications
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Optimal bounds for the predecessor problem and related problems
Journal of Computer and System Sciences - STOC 1999
Lower bounds for external memory dictionaries
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Tight bounds for the partial-sums problem
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Journal of Algorithms
On dynamic range reporting in one dimension
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Time-space trade-offs for predecessor search
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Linear hashing: a new tool for file and table addressing
VLDB '80 Proceedings of the sixth international conference on Very Large Data Bases - Volume 6
On the complexity of a game related to the dictionary problem
SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
Optimality in External Memory Hashing
Algorithmica
Algorithms and Data Structures for External Memory
Algorithms and Data Structures for External Memory
Cell probe lower bounds for succinct data structures
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Dynamic external hashing: the limit of buffering
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
On the cell probe complexity of dynamic membership
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using hashing to solve the dictionary problem
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Indexability of 2D range search revisited: constant redundancy and weak indivisibility
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Hi-index | 0.00 |
We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any other constant b log n(n/m)), where n is the number of elements in the dictionary. In contrast, when the update time is allowed to be 1 + o(1), then a bit vector or hash table give query time O(1). Thus, this is a threshold phenomenon for data structures. This lower bound answers a folklore conjecture of the external memory community. Since almost any data structure task can solve membership, our lower bound implies a dichotomy between two alternatives: (i) make the amortized update time at least 1 (so the data structure does not buffer, and we lose one of the main potential advantages of the cache), or (ii) make the query time at least roughly logarithmic in n. Our result holds even when the updates and queries are chosen uniformly at random and there are no deletions; it holds for randomized data structures, holds when the universe size is O(n), and does not make any restrictive assumptions such as indivisibility. All of the lower bounds we prove hold regardless of the space consumption of the data structure, while the upper bounds only need linear space. The lower bound has some striking implications for external memory data structures. It shows that the query complexities of many problems such as 1D-range counting, predecessor, rank-select, and many others, are all the same in the regime where the amortized update time is less than 1, as long as the cell size is large enough (b = polylog(n) suffices). The proof of our lower bound is based on a new combinatorial lemma called the Lemma of Surprising Intersections (LOSI) which allows us to use a proof methodology where we first analyze the intersection structure of the positive queries by using encoding arguments, and then use statistical arguments to deduce properties of the intersection structure of all queries, even the negative ones. In most other data structure arguments that we know, it is difficult to argue anything about the negative queries. Therefore we believe that the LOSI and this proof methodology might find future uses for other problems.