Text plagiarism detection method based on path patterns

Authors:
Chun Kit See;Kuok-Shoong Wong;Wei Lee Woon
Affiliations:
Department of Information Technology, Malaysia University of Science and Technology, Unit GL33, Block C, Dataran Usahawan Kelana, 17 Jalan SS7/26, 47301 Petaling Jaya, Malaysia.;Department of Information Technology, Malaysia University of Science and Technology, Unit GL33, Block C, Dataran Usahawan Kelana, 17 Jalan SS7/26, 47301 Petaling Jaya, Malaysia.;Department of Information Technology, Malaysia University of Science and Technology, Unit GL33, Block C, Dataran Usahawan Kelana, 17 Jalan SS7/26, 47301 Petaling Jaya, Malaysia
Venue:
International Journal of Business Intelligence and Data Mining
Year:
2008

Citing 8
Cited 0

Efficiency of data structures for detecting overlaps in digital documents

ACSC '01 Proceedings of the 24th Australasian conference on Computer science
"Uni cheats racket": a case study in plagiarism investigation

ACE '04 Proceedings of the Sixth Australasian Conference on Computing Education - Volume 30
Tool support for plagiarism detection in text documents

Proceedings of the 2005 ACM symposium on Applied computing
Sentence-based natural language plagiarism detection

Journal on Educational Resources in Computing (JERIC)
Plagiarism on the rise

Communications of the ACM - Hacking and innovation
Near-duplicate detection by instance-level constrained clustering

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
A source code linearization technique for detecting plagiarized programs

Proceedings of the 12th annual SIGCSE conference on Innovation and technology in computer science education
Principles of hash-based text retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper extends the forward method plagiarism detection in finding the percentage of similarity between documents. We have developed an algorithm to quantify the similarity based on path patterns, and the method employed is simple, as it involves only ordinary mathematics, thus simplifying application programming and speed up processing time. The method simply converts words into steps, which walks on a mesh, using a new proposed hash function. The hash function guarantees that the number of steps for each different word is unique and thus the walk pattern on a mesh is unique. Hence, a plagiarised version document will display a unique pattern on a mesh that is similar to the original document. This extended paper presents the algorithm in detail, and results are compared with an available online plagiarism detection tool.