A plagiarism detection system for arabic text-based documents

  • Authors:
  • Ameera Jadalla;Ashraf Elnagar

  • Affiliations:
  • Department of Computer Science, University of Sharjah, Sharjah, UAE;Department of Computer Science, University of Sharjah, Sharjah, UAE

  • Venue:
  • PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel plagiarism detection system for Arabic text-based documents, Iqtebas 1.0. This is a primary work dedicated for plagiarism of Arabic based documents. Arabic is a rich morphological language that is among the top used languages in the world and in the Internet as well. Given a document and a set of suspected files, our goal is to compute the originality value of the examined document. The originality value of a text is computed by computing the distance between each sentence in the text and the closest sentence in the suspected files, if exists. The proposed system structure is based on a search engine in order to reduce the cost of pairwise similarity. For the indexing process, we use the winnowing n-gram fingerprinting algorithm to reduce the index size. The fingerprints of each sentence are its n-grams that are represented by hash codes. The winnowing algorithm computes fingerprints for each sentence. As a result, the search time is improved and the detection process is accurate and robust. The experimental results showed superb performance of Iqtebas 1.0 as it achieved a recall value of 94% and a precision of 99%.Moreover, a comparison that is carried out between Iqtebas and the well known plagiarism detection system, SafeAssign, confirmed the high performance of Iqtebas.