Quick greedy computation for minimum common string partitions

  • Authors:
  • Isaac Goldstein;Moshe Lewenstein

  • Affiliations:
  • Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat Gan, Israel

  • Venue:
  • CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the minimum common string partition problem one is given two strings S and T with the same character statistics and one seeks the smallest partition of S into substrings so that T can also be partitioned into the same substring multiset. The problem is fundamental in several variants of edit distance with block operations, e.g. signed reversal distance with duplicates and edit distance with moves. The minimum common string partition problem is known to be NP-complete and the best approximation known is of order O(log n log* n). Since this problem is of utmost practical importance one seeks a heuristic that will (1) usually have a low approximation factor and (2) will run fast. A simple greedy algorithm is known and it has been well-studied from an approximation point of view. It has been shown to have a bad worst case approximation factor. However, all the bad approximation factors presented so far stem from complicated recursive construction. In practice the greedy algorithm seems to have small approximation factors. However, the best current implementation of greedy runs in quadratic time. We propose a novel method to implement greedy in linear time.