Exploring the duality between skip lists and binary search trees
ACM-SE 45 Proceedings of the 45th annual southeast regional conference
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
B-tries for disk-based string management
The VLDB Journal — The International Journal on Very Large Data Bases
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Optimal self-adjusting trees for dynamic string data in secondary storage
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Hi-index | 0.00 |
Data warehouses are increasingly storing and managing large scale string data, and dealing with large volume of transactions that update and search string data. Motivated by this context, we initiate the study of self-adjusting data structures for string dictionary operations, that is, data structures that are designed to be efficient on an entire sequence rather than individual string operations. Furthermore, we study this problem in the external memory modelwhere string data is too massive to be stored in internal memory and has to reside in disks; each access to a disk page fetches B items, and the cost of the operations is the number of pages accessed (I/Os).We show that given a dictionary of n strings S_1 , \ldots ,S_n of total length N, asequence of m string searches S_{i_1 } ,S_{i_2 } , \ldots ,S_{i_m } takes 0(\sum\limits_{j = 1}^m {(\frac{{\left| {S_{i_j } } \right|}}{B}}+ \sum\limits_{i = 1}^n {(n_i } \log _B \frac{m}{{n_i }})) expected I/Os, where ni is the number of times Si is queried. Inserting or deleting a string S takes 0(\frac{{\left| S \right|}}{B} + \log _B n) expected amortized I/Os. The \sum\nolimits_{j = 1}^m \frac{{\left| {S_{i_j } } \right|}}{B} term is a lower bound for reading the input; the \sum\nolimits_{i = 1}^n {n_i } \log _B \frac{m}{{n_i }} term is the entropy of the query sequence and is a standard information-theoretic lower bound. This is the Static Optimality Theorem for external-memory string access. The static optimality theorem was first formalized and proved by Tarjan and Sleator for numerical dictionary in their classic "splay" trees paper in 1985 [16]; they left open the B-tree case for numerical values (page 684), and a fortiori, the case of string data in an external-memory setting, that we settle here. We obtain our result not by using traditional "splay" operations on search trees as in [16], but by designing a novel and conceptually simple self-adjusting data structure based on the well-known skip lists.