On effective multi-dimensional indexing for strings

  • Authors:
  • H. V. Jagadish;Nick Koudas;Divesh Srivastava

  • Affiliations:
  • University of Michigan;AT&T Labs-Research;AT&T Labs-Research

  • Venue:
  • SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

As databases have expanded in scope from storing purely business data to include XML documents, product catalogs, e-mail messages, and directory data, it has become increasingly important to search databases based on wild-card string matching: prefix matching, for example, is more common (and useful) than exact matching, for such data. In many cases, matches need to be on multiple attributes/dimensions, with correlations between the dimensions. Traditional multi-dimensional index structures, designed with (fixed length) numeric data in mind, are not suitable for matching unbounded length string data.In this paper, we describe a general technique for adapting a multi-dimensional index structure for wild-card indexing of unbounded length string data. The key ideas are (a) a carefully developed mapping function from strings to rational numbers, (b) representing an unbounded length string in an index leaf page by a fixed length offset to an external key, and (c) storing multiple elided tries, one per dimension, in an index page to prune search during traversal of index pages. These basic ideas affect all index algorithms. In this paper, we present efficient algorithms for different types of string matching.While our technique is applicable to a wide range of multi-dimensional index structures, we instantiate our generic techniques by adapting the 2-dimensional R-tree to string data. We demonstrate the space effectiveness and time benefits of using the string R-tree both analytically and experimentally.