SimPaD: A word-similarity sentence-based plagiarism detection tool on Web documents

  • Authors:
  • Maria Soledad Pera;Yiu-Kai Ng

  • Affiliations:
  • -;(Correspd.) 3361 TMCB, Computer Science Department, Brigham Young University, Provo, Utah, USA, E-mail: {ng,mpera}@cs.byu.edu

  • Venue:
  • Web Intelligence and Agent Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Plagiarism is a serious problem that infringes copyrighted documents/materials, which is an unethical practice and decreases the economic incentive received by their legal owners. Unfortunately, plagiarism is getting worse due to the increasing number of on-line publications and easy access on the Web, which facilitates locating and paraphrasing information. In solving this problem, we propose a novel plagiarism-detection method, called SimPaD, which (i) establishes the degree of resemblance between any two documents D1 and D2 based on their sentence-to-sentence similarity computed by using pre-defined word-correlation factors, and (ii) generates a graphical view of sentences that are similar (or the same) in D1 and D2. Experimental results verify that SimPaD is highly accurate in detecting (non-)plagiarized documents and outperforms existing plagiarism-detection approaches.