Block edit models for approximate string matching
Theoretical Computer Science - Special issue: Latin American theoretical informatics
The string-to-string correction problem with block moves
ACM Transactions on Computer Systems (TOCS)
The string edit distance matching problem with moves
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Edit Distance with Move Operations
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Minimum common string partition problem: hardness and approximations
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Minimum Common String Partition Parameterized
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Hi-index | 0.00 |
The Minimum Common String Partition problem (MCSP) is to partition two given input strings into the same collection of substrings, where the number of substrings in the partition is minimized. This problem is a key problem in genome rearrangement, and is closely related to the problem of sorting by reversals with duplicates. MCSP is NP-hard, even for the most trivial case, 2-MCSP, where each letter occurs at most twice in each input string. There are various approximation algorithms which can achieve very good approximation ratios but with complicated implementations, for example, 1.5-approximation algorithm for 2-MCSP, 1.1037-approximation algorithm for 2-MCSP and a 4-approximation algorithm for 3-MCSP. There is also a simple greedy algorithm for MCSP which extracts the longest common substring from the given strings at each step. In this paper, we propose a novel greedy algorithm for MCSP, where we extract the longest common substring containing a symbol occurring only once at each step whenever there is a such symbol. We show our algorithm is more "worst case" greedy at each step than the greedy algorithm and the expected performance of our algorithm is better than that of the greedy algorithm. Our experiments show that our method achieves a better partition on average than the greedy algorithm does. Another advantage of our algorithm is that it is much faster than the greedy algorithm.