Block edit models for approximate string matching
Theoretical Computer Science - Special issue: Latin American theoretical informatics
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals
Journal of the ACM (JACM)
Efficient algorithms for approximate string matching with swaps
Journal of Complexity
The String-to-String Correction Problem
Journal of the ACM (JACM)
Approximate nearest neighbors and sequence comparison with block operations
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
An efficient algorithm for sequence comparison with block reversals
Theoretical Computer Science - Latin American theorotical informatics
An algorithm for string edit distance allowing substring reversals
BIBE '06 Proceedings of the Sixth IEEE Symposium on BionInformatics and BioEngineering
Hi-index | 0.89 |
The edit distance between given two strings X and Y is the minimum number of edit operations that transform X into Y without performing multiple operations that involve the same position. Ordinarily, string editing is based on character insert, delete, and substitute operations. Motivated from the facts that substring reversals are observed in genomic sequences, and it is not always possible to transform a given sequence X into a given sequence Y by reversals alone (e.g., X is all 0's, and Y is all 1's), Muthukrishnan and Sahinalp [S. Muthukrishnan, S.C. Sahinalp, Approximate nearest neighbors and sequence comparison with block operations, in: Proc. ACM Symposium on Theory of Computing (STOC), 2000, pp. 416-424; S. Muthukrishnan, S.C. Sahinalp, An improved algorithm for sequence comparison with block reversals, Theoretical Computer Science 321 (1) (2004) 95-101] considered a ''simple'' well-defined edit distance model in which the edit operations are: replace a character, and reverse and replace a substring. A substring of X can only be reversed if the reversal results in a match in the same position in Y. The cost of each character replacement and substring reversal is 1. The distance in this case is defined only when |X|=|Y|=n. There is an algorithm for computing the distance in this model with worst-case time complexity O(nlog^2n) [S. Muthukrishnan, S.C. Sahinalp, An improved algorithm for sequence comparison with block reversals, Theoretical Computer Science 321 (1) (2004) 95-101]. We present a dynamic programming algorithm with worst-case time complexity O(n^2) but its expected running-time is O(n). In our dynamic programming solution the weights of edit operations can vary for different substitutions, and the costs of reversals can be a function of the reversal-length.