String Matching with Metric Trees Using an Approximate Distance

  • Authors:
  • Ilaria Bartolini;Paolo Ciaccia;Marco Patella

  • Affiliations:
  • -;-;-

  • Venue:
  • SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Searching in a large data set those strings that are more similar, according to the edit distance, to a given one is a time-consuming process. In this paper we investigate the performance of metric trees, namely the M-tree, when they are extended using a cheap approximate distance function as a filter to quickly discard irrelevant strings. Using the bag distance as an approximation of the edit distance, we show an improvement in performance up to 90% with respect to the basic case. This, along with the fact that our solution is independent on both the distance used in the pre-test and on the underlying metric index, demonstrates that metric indices are a powerful solution, not only for many modern application areas, as multimedia, data mining and pattern recognition, but also for the string matching problem.