Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm

  • Authors:
  • Zhan Su;Byung-Ryul Ahn;Ki-Yol Eom;Min-Koo Kang;Jin-Pyung Kim;Moon-Kyun Kim

  • Affiliations:
  • -;-;-;-;-;-

  • Venue:
  • ICICIC '08 Proceedings of the 2008 3rd International Conference on Innovative Computing Information and Control
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Plagiarism in texts is issues of increasing concern to the academic community. Now most common text plagiarism occurs by making a variety of minor alterations that include the insertion, deletion, or substitution of words. Such simple changes, however, require excessive string comparisons. In this paper, we present a hybrid plagiarism detection method. We investigate the use of a diagonal line, which is derived from Levenshtein distance, and simplified SmithWaterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection. Our approach avoids globally involved string comparisons and considers psychological factors, which can yield significant speed-up by experiment results. Based on the results, we indicate the practicality of such improvement using Levenshtein distance and Smith-Waterman algorithm and to illustrate the efficiency gains. In the future, it would be interesting to explore appropriate heuristics in the area of text comparison