Static Optimality Theorem for External Memory String Access

  • Authors:
  • Valentina Ciriani;Paolo Ferragina;Fabrizio Luccio;S. Muthukrishnan

  • Affiliations:
  • -;-;-;-

  • Venue:
  • FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data warehouses are increasingly storing and managing large scale string data, and dealing with large volume of transactions that update and search string data. Motivated by this context, we initiate the study of self-adjusting data structures for string dictionary operations, that is, data structures that are designed to be efficient on an entire sequence rather than individual string operations. Furthermore, we study this problem in the external memory modelwhere string data is too massive to be stored in internal memory and has to reside in disks; each access to a disk page fetches B items, and the cost of the operations is the number of pages accessed (I/Os).We show that given a dictionary of n strings S_1 , \ldots ,S_n of total length N, asequence of m string searches S_{i_1 } ,S_{i_2 } , \ldots ,S_{i_m } takes 0(\sum\limits_{j = 1}^m {(\frac{{\left| {S_{i_j } } \right|}}{B}}+ \sum\limits_{i = 1}^n {(n_i } \log _B \frac{m}{{n_i }})) expected I/Os, where ni is the number of times Si is queried. Inserting or deleting a string S takes 0(\frac{{\left| S \right|}}{B} + \log _B n) expected amortized I/Os. The \sum\nolimits_{j = 1}^m \frac{{\left| {S_{i_j } } \right|}}{B} term is a lower bound for reading the input; the \sum\nolimits_{i = 1}^n {n_i } \log _B \frac{m}{{n_i }} term is the entropy of the query sequence and is a standard information-theoretic lower bound. This is the Static Optimality Theorem for external-memory string access. The static optimality theorem was first formalized and proved by Tarjan and Sleator for numerical dictionary in their classic "splay" trees paper in 1985 [16]; they left open the B-tree case for numerical values (page 684), and a fortiori, the case of string data in an external-memory setting, that we settle here. We obtain our result not by using traditional "splay" operations on search trees as in [16], but by designing a novel and conceptually simple self-adjusting data structure based on the well-known skip lists.