The cell probe complexity of dynamic data structures

  • Authors:
  • M. Fredman;M. Saks

  • Affiliations:
  • Bellcore and U.C. San Diego;U.C. San Diego, Bellcore and Rutgers University

  • Venue:
  • STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dynamic data structure problems involve the representation of data in memory in such a way as to permit certain types of modifications of the data (updates) and certain types of questions about the data (queries). This paradigm encompasses many fundamental problems in computer science.The purpose of this paper is to prove new lower and upper bounds on the time per operation to implement solutions to some familiar dynamic data structure problems including list representation, subset ranking, partial sums, and the set union problem. The main features of our lower bounds are:They hold in the cell probe model of computation (A. Yao [18]) in which the time complexity of a sequential computation is defined to be the number of words of memory that are accessed. (The number of bits b in a single word of memory is a parameter of the model). All other computations are free. This model is at least as powerful as a random access machine and allows for unusual representation of data, indirect addressing etc. This contrasts with most previous lower bounds which are proved in models (e.g., algebraic, comparison, pointer manipulation) which require restrictions on the way data is represented and manipulated.The lower bound method presented here can be used to derive amortized complexities, worst case per operation complexities, and randomized complexities.The results occasionally provide (nearly tight) tradeoffs between the number R of words of memory that are read per operation, the number W of memory words rewritten per operation and the size b of each word. For the problems considered here there is a parameter n that represents the size of the data set being manipulated and for these problems b = logn is a natural register size to consider. By letting b vary, our results illustrate the effect of register size on time complexity. For instance, one consequence of the results is that for some of the problems considered here, increasing the register size from logn to polylog(n) only reduces the time complexity by a constant factor. On the other hand, decreasing the register size from logn to 1 increases time complexity by a logn factor for one of the problems we consider and only a loglogn factor for some other problems.The first two specific data structure problems for which we obtain bounds are:List Representation. This problem concerns the representation of an ordered list of at most n (not necessarily distinct) elements from the universe U = {1, 2,…, n}. The operations to be supported are report(k), which returns the kth element of the list, insert(k, u) which inserts element u into the list between the elements in positions k - 1 and k, delete(k), which deletes the kth item.Subset Rank. This problem concerns the representation of a subset S of U = {1, 2,…, n}. The operations that must be supported are the updates “insert item j into the set” and “delete item j from the set” and the queries rank(j), which returns the number of elements in S that are less than or equal to j.The natural word size for these problems is b = logn, which allows an item of U or an index into the list to be stored in one register. One simple solution to the list representation problem is to maintain a vector v, whose kth entry contains the kth item of the list. The report operation can be done in constant time, but the insert and delete operations may take time linear in the length of the list. Alternatively, one could store the items of the list with each element having a pointer to its predecessor and successor in the list. This allows for constant time updates (given a pointer to the appropriate location), but requires linear cost for queries.This problem can be solved must more efficiently by use of balanced trees (such as AVL trees). When b = logn, the worst case cost per operation using AVL trees is O(logn). If instead b = 1, so that each bit access costs 1, then the AVL three solution requires O(log2n) per operation.It is not hard to find similar upper bounds for the subset rank problem (the algorithms for this problem are actually simpler than AVL trees).The question is: are these upper bounds bet possible? Our results show that the upper bounds for the case of logn bit registers are within a loglogn factor of optimal. On the other hand, somewhat surprisingly, for the case of single bit registers there are implementations for both of these problems that run in time significantly faster than O(log2n) per operation.Let CPROBE(b) denote the cell probe computational model with register size b.Theorem 1. If b ≤ (logn)t for some t, then any CPROBE(b) implementation of either list representation or the subset rank requires &OHgr;(logn/loglogn) amortized time per operation.Theorem 2. Subset rank and list representation have CPROBE(1) implementations with respective complexities O((logn)(loglogn)) and O((logn)(loglogn)2) per operation.Paul Dietz (personal communication) has found an implementation of list representation with logn bit registers that requires only O(logn/loglogn) time per operation, and thus the result of theorem 1 is best possible.The lower bounds of theorem 1 are derived from lower bounds for a third problem:Partial sum mode k. An array A[1],…, A[N] of integers mod k is to be represented. Updates are add(i, &dgr;) which implements A[i] ← A[i] + &dgr;; and queries are sum(j) which returns &Sgr;i≤jA[i] (mod k).This problem is demoted PS(n, k). Our main lower bound theorems provide tradeoffs between the number of register rewrites and register reads as a function of n, k, and b. Two corollaries of these results are:Theorem 3. Any CPROBE(b) implementation of PS(n, 2) (partial sums mod 2) requires &OHgr;(logn/(loglogn + logb)) amortized time per operation, and for b ≥ logn, there is an implementation that achieves this. In particular, if b = &THgr;((logn)c) for some constant c, then the optimal time complexity of PS(n, 2) is &thgr;(logn/loglogn).Theorem 4. Any CPROBE(1) implementation of PS(n, n) with single bit registers requires &OHgr;((logn/loglogn)2) amortized time per operation, and there is an implementation that achieves O(log2n) time per operation.It can be shown that a lower bound of PS(n, 2) is also a lower bound for both list representation and subset rank (the details, which are not difficult, are omitted from this report), and thus theorem 1 follows from theorem 3. The results of theorem 4 make an interesting contrast with those of theorem 2. For the three problems, list representation, subset rank and PS(n, k), there are standard algorithms that can be implemented on a CPROBE(logn) that use time O(logn per operation, and their implementations on CPROBE(1) require O(log2n) time. Theorem 4 says that for the problem PS(n, n) this algorithm is essentially best possible, while theorem 2 says that for list representation and rank, the algorithm can be significantly improved. In fact, the rank problem an be viewed as a special case of PS(n, n) where the variables take on values on {0, 1}, and apparently this specialization is enough to reduce the complexity on a CPROBE(1) by a factor of logn/loglogn, even though on a CPROBE(logn) the complexities of the two problems differ by no more than a loglogn factor.The third problem we consider is the set union problem. This problem concerns the design of a data structure for the on-line manipulation of sets in the following setting. Initially, there are n singleton sets {1}, {2},…, {n} with i chosen as the name of the set {i}. Our data structure is required to implement two operations, Find(j), and Union(A, B, C). The operation Find(j) returns the name of the set containing j. The operation Union(A, B, C) combines the sets with names A and B. The names of the existing sets at any moment must be unique and chosen to be integers in the range from 1 to 2n. The sets existing at any time are disjoint and define a partition of the elements into equivalence classes.A well known data structure for the set union problem represents the sets as trees and stores the name of a set in the root of its corresponding tree. A Union operation is performed by attaching the root of the smaller set as a child of the root of the larger set (weight rule). A Find operation is implemented by following the path from the appropriate node to the root of the tree containing it, and then redirecting to the root the parent pointers of the nodes encountered along this path (path compression). From now on we consider sequences of Union and Find operations consisting of n-1 Union operations and m Find operations with m ≥ n. Tarjan [14] demonstrated that the above algorithm requires time &thgr;(m&agr;(m, n)), where &agr;(m, n) is an inverse to Ackermann's function, to execute n-1 Union and m Find operations. In particular, if m = &thgr;(n), then the running time is almost, but not quite, linear. Tarjan conjectured [14] that no linear time algorithm exists for the set union problem, and provided significant evidence in favor of this conjecture (which we discuss in the following section). We affirm Tarjan's conjecture in the CPROBE(logn) model.Theorem 5. Any CPROBE(logn) implementation of the set union problem requires &OHgr;(m&agr;(m, n)) time to execute m Find's and n-1 Union's, beginning with n singleton sets.N. Blum [2] has given a logn/loglogn algorithm (worst case time per operation) for the set union problem. This algorithm is also optimal in the CPROBE(polylogn) model.The following Section provides further discussion of these results, Section 3 outlines our lower bound method, and Section 4 contains some proofs.