An algorithm with linear expected running time for string editing with substitutions and substring reversals

Authors:
Abdullah N. Arslan
Affiliations:
Department of Computer Science, University of Vermont, Burlington, VT 05405, USA
Venue:
Information Processing Letters
Year:
2008

Citing 8
Cited 0

Block edit models for approximate string matching

Theoretical Computer Science - Special issue: Latin American theoretical informatics
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals

Journal of the ACM (JACM)
Efficient algorithms for approximate string matching with swaps

Journal of Complexity
The String-to-String Correction Problem

Journal of the ACM (JACM)
Approximate nearest neighbors and sequence comparison with block operations

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
An efficient algorithm for sequence comparison with block reversals

Theoretical Computer Science - Latin American theorotical informatics
An algorithm for string edit distance allowing substring reversals

BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering

Quantified Score

Hi-index	0.89

Visualization

Abstract

The edit distance between given two strings X and Y is the minimum number of edit operations that transform X into Y without performing multiple operations that involve the same position. Ordinarily, string editing is based on character insert, delete, and substitute operations. Motivated from the facts that substring reversals are observed in genomic sequences, and it is not always possible to transform a given sequence X into a given sequence Y by reversals alone (e.g., X is all 0's, and Y is all 1's), Muthukrishnan and Sahinalp [S. Muthukrishnan, S.C. Sahinalp, Approximate nearest neighbors and sequence comparison with block operations, in: Proc. ACM Symposium on Theory of Computing (STOC), 2000, pp. 416-424; S. Muthukrishnan, S.C. Sahinalp, An improved algorithm for sequence comparison with block reversals, Theoretical Computer Science 321 (1) (2004) 95-101] considered a ''simple'' well-defined edit distance model in which the edit operations are: replace a character, and reverse and replace a substring. A substring of X can only be reversed if the reversal results in a match in the same position in Y. The cost of each character replacement and substring reversal is 1. The distance in this case is defined only when |X|=|Y|=n. There is an algorithm for computing the distance in this model with worst-case time complexity O(nlog^2n) [S. Muthukrishnan, S.C. Sahinalp, An improved algorithm for sequence comparison with block reversals, Theoretical Computer Science 321 (1) (2004) 95-101]. We present a dynamic programming algorithm with worst-case time complexity O(n^2) but its expected running-time is O(n). In our dynamic programming solution the weights of edit operations can vary for different substitutions, and the costs of reversals can be a function of the reversal-length.