Optimal Exact Strring Matching Based on Suffix Arrays

  • Authors:
  • Mohamed Ibrahim Abouelhoda;Enno Ohlebusch;Stefan Kurtz

  • Affiliations:
  • -;-;-

  • Venue:
  • SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using the suffix tree of a string S, decision queries of the type "Is P a substring of S?" can be answered in O(|P|) time and enumeration queries of the type "Where are all z occurrences of P in S?" can be answered in O(|P|+z) time, totally independent of the size of S. However, in large scale applications as genome analysis, the space requirements of the suffix tree are a severe drawback. The suffix array is a more space economical index structure. Using it and an additional table, Manber and Myers (1993) showed that decision queries and enumeration queries can be answered in O(|P|+log |S|) and O(|P|+log |S|+z) time, respectively, but no optimal time algorithms are known. In this paper, we show how to achieve the optimal O(|P|) and O(|P| + z) time bounds for the suffix array. Our approach is not confined to exact pattern matching. In fact, it can be used to efficiently solve all problems that are usually solved by a top-down traversal of the suffix tree. Experiments show that our method is not only of theoretical interest but also of practical relevance.