A better edit distance measure allowing for block swaps

  • Authors:
  • Nhauo Davuth;Sung-Ryul Kim

  • Affiliations:
  • Fusion Konkuk University, South Korea;Fusion Konkuk University, South Korea

  • Venue:
  • Proceedings of the 2013 Research in Adaptive and Convergent Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Edit Distance, also known as the Levenshtein distance or evolutionary distance, is a concept from information retrieval, and it describes the number of edit operations that have to be made in order to change one string to another. It's one of the most common measures to expose the dissimilarity between two strings. Ordinarily, Edit Distance is based on a character insert, delete and substitution operations. By using these three operators Edit Distance can help us to solve the problem of computing the similarity between two sequences that arise in many areas. However, standard Edit Distance still seems to miss the true relationship between these two similar strings in some cases because of the sequential order of common sub strings. For example, the Edit Distance between "classbook" and "bookclass" is eight, because of the words "book" and "class" is reversed but intuitively the two strings seem much closer. In order to solve this problem, we propose a method for extended Edit Distance, which permits block swap operation. The main contribution in this paper is the method to compute the cut points over a single string, and then allowing block swaps, which move sub strings from one position to another in a string, in order to make common substrings in the right order. Through our experiment, it is revealed that Block Swap Edit Distance can help us to find a better measure for Edit Distance.