Korean documents copy detection based on ferret

Authors:
Byung Ryul Ahn;Won-gyum Kim;Won Young Yu;Moon-Hyun Kim
Affiliations:
Artificial Intelligence Lab, School of Computer Engineering, SungKyunKwan Univ., Suwon-si, South Korea;Copyright Protection Center, Seoul, South Korea;Contents Research Division, ETRI, Daejon, South Korea;Artificial Intelligence Lab, School of Computer Engineering, SungKyunKwan Univ., Suwon-si, South Korea
Venue:
ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing
Year:
2011

Citing 2
Cited 0

Copy detection mechanisms for digital documents

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
PPChecker: plagiarism pattern checker in document copy detection

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the development of electronic documents, plagiarism is rapidly increasing and, given the difficulty of manual detection, need for plagiarism detection systems to help protect intellectual property has emerged. Many content-based detection systems have been developed and are actually used in some foreign countries, but they are still insufficient for documents in Korean. In particular, the high variance of Hangul makes the development of detection systems more difficult. This study proposes a Hangul document detection method based on Ferret's trigrams. Ferret only considered the frequency of trigram matches as a way to detect similarity, but in this study the system is developed further by weighting results depending on the degree of trigram match, thereby improving the accuracy of similarity detection.