Plagiarism Detection Using the Levenshtein Distance and Smith-Waterman Algorithm

Authors:
Zhan Su;Byung-Ryul Ahn;Ki-Yol Eom;Min-Koo Kang;Jin-Pyung Kim;Moon-Kyun Kim
Affiliations:
-;-;-;-;-;-
Venue:
ICICIC '08 Proceedings of the 2008 3rd International Conference on Innovative Computing Information and Control
Year:
2008

Citing 0
Cited 5

SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

Web Intelligence and Agent Systems
Obfuscating plagiarism detection: vulnerabilities and solutions

Proceedings of the 12th International Conference on Computer Systems and Technologies
Citation pattern matching algorithms for citation-based plagiarism detection: greedy citation tiling, citation chunking and longest common citation sequence

Proceedings of the 11th ACM symposium on Document engineering
Using structural information and citation evidence to detect significant plagiarism cases in scientific publications

Journal of the American Society for Information Science and Technology
A study on improved similarity measure algorithm for text-based document

FGIT'12 Proceedings of the 4th international conference on Future Generation Information Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Plagiarism in texts is issues of increasing concern to the academic community. Now most common text plagiarism occurs by making a variety of minor alterations that include the insertion, deletion, or substitution of words. Such simple changes, however, require excessive string comparisons. In this paper, we present a hybrid plagiarism detection method. We investigate the use of a diagonal line, which is derived from Levenshtein distance, and simplified SmithWaterman algorithm that is a classical tool in the identification and quantification of local similarities in biological sequences, with a view to the application in the plagiarism detection. Our approach avoids globally involved string comparisons and considers psychological factors, which can yield significant speed-up by experiment results. Based on the results, we indicate the practicality of such improvement using Levenshtein distance and Smith-Waterman algorithm and to illustrate the efficiency gains. In the future, it would be interesting to explore appropriate heuristics in the area of text comparison