Robust plagiary detection using semantic compression augmented SHAPD

  • Authors:
  • Dariusz Ceglarek;Konstanty Haniewicz;Wojciech Rutkowski

  • Affiliations:
  • Poznan School of Banking, Poland;Poznan University of Economics, Poland;Ciber, Poland

  • Venue:
  • ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work presents results of the ongoing novel research in the area of semantic networks, plagiarism detection and general natural language processing. Results presented here demonstrate that the semantic compression is a valuable addition to the existing methods used in plagiary detection. The application of the semantic compression boosts the efficiency of Sentence Hashing Algorithm for Plagiarism Detection (SHAPD) and authors' implementation of the w-shingling algorithm. There were also test with use of the traditional Vector Space Model method that demonstrated that this technique is not well suited for plagiary detection contrary to general beliefs. All the experiments were performed on a generally available corpus built so that such analysis can be comparable to efforts of other research teams.