An algorithm for string edit distance allowing substring reversals

  • Authors:
  • Abdullah N. Arslan

  • Affiliations:
  • University of Vermont

  • Venue:
  • BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The edit distance between given two strings X and Y is the minimum number of edit operations that transform X into Y . Ordinarily, string editing is based on character insert, delete, and substitute operations. It has been suggested that extending this model with block (substring) edits would be useful in applications such as DNA sequence comparison. In its general form, the resulting problem is NP-hard. However, there are efficient algorithms when string edits include only character, and block replacements. We introduce a new edit model which permits insertions, deletions, and substitutions at character level, and also reversals of substrings. We present an algorithm whose worst-case time complexity is O(n2m) where n = |X| \le m = |Y |, and we prove that the average running time of the algorithm is O(nm). Our experiments on randomly generated strings verify these results. The main contribution of this paper is that we present an algorithm to find all possible reversals using a generalized suffix tree, which is fast on average.