Optimal indexing using near-minimal space

  • Authors:
  • C. Heeren;H. V. Jagadish;L. Pitt

  • Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL;University of Michigan, Ann Arbor, MI;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the index selection problem. Given either a fixed query workload or an unknown probability distribution on possible future queries, and a bound B on how much space is available to build indices, we seek to build a collection of indices for which the average query response time is minimized. We give strong negative and positive peformance bounds.Let m be the number of queries in the workload. We show how to obtain with high probability a collection of indices using space O(B ln m) for which the average query cost is optB, the optimal performance possible for indices using at most B total space. Moreover, this space relaxation is necessary: unless NP ⊆ nO(log log n), no polynomial time algorithm can guarantee average query cost less than M1--ε optB using space αB, for any constant α, where M is the size of the dataset. We quantify the error in performance introduced by running the algorithm on a sample drawn from a query distribution.