Two-dimensional substring indexing

  • Authors:
  • Paolo Ferragina;Nick Koudas;S. Muthukrishnan;Divesh Srivastava

  • Affiliations:
  • Dipartimento di Informatica, University of Pisa, Corso Italia 40, 56125 Pisa, Italy;AT&T Labs-Research, 180 Park Avenue, Building 103, Florham Park, NJ;AT&T Labs-Research, 180 Park Avenue, Building 103, Florham Park, NJ;AT&T Labs-Research, 180 Park Avenue, Building 103, Florham Park, NJ

  • Venue:
  • Journal of Computer and System Sciences - Special issu on PODS 2001
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy.