Edit distance with duplications and contractions revisited

  • Authors:
  • Tamar Pinhas;Dekel Tsur;Shay Zakov;Michal Ziv-Ukelson

  • Affiliations:
  • Department of Computer Science Ben-Gurion University of the Negev, Israel;Department of Computer Science Ben-Gurion University of the Negev, Israel;Department of Computer Science Ben-Gurion University of the Negev, Israel;Department of Computer Science Ben-Gurion University of the Negev, Israel

  • Venue:
  • CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose three algorithms for the problem of string edit distance with duplication and contraction operations, which improve the time complexity of previous algorithms for this problem. These include a faster algorithm for the general case of the problem, and two improvements which apply under certain assumptions on the cost function. The general algorithm is based on fast min-plus multiplication of square matrices, and obtains the running time of O(|Σ|n3 log3 log n/log2 n), where n is the length of the input strings and |Σ| is the alphabet size. This algorithm is further accelerated, under some assumption on the cost function, to O(|Σ| (n2 + nn′2 log3 log n′/log2 n′)) time, where n′ is the length of the run-length encoding of the input. Another improvement is based on a new fast matrix-vector min-plus multiplication under a certain discreteness assumption, and yields an O(|Σ| n3/log2 n) time algorithm. Furthermore, this algorithm is online, in the sense that one of the strings may be given letter by letter. As part of this algorithm we present the currently fastest online algorithm for weighted CFG parsing for discrete weighted grammars. This result is useful on its own.