A plagiarism detection system for arabic text-based documents

Authors:
Ameera Jadalla;Ashraf Elnagar
Affiliations:
Department of Computer Science, University of Sharjah, Sharjah, UAE;Department of Computer Science, University of Sharjah, Sharjah, UAE
Venue:
PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
Year:
2012

Citing 10
Cited 0

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
An algorithmic approach to the detection and prevention of plagiarism

ACM SIGCSE Bulletin
Building a distributed full-text index for the web

ACM Transactions on Information Systems (TOIS)
Methods for identifying versioned and plagiarized documents

Journal of the American Society for Information Science and Technology
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Arabic Stemming Without A Root Dictionary

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Local text reuse detection

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Multilingual dependency analysis with a two-stage discriminative parser

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel plagiarism detection system for Arabic text-based documents, Iqtebas 1.0. This is a primary work dedicated for plagiarism of Arabic based documents. Arabic is a rich morphological language that is among the top used languages in the world and in the Internet as well. Given a document and a set of suspected files, our goal is to compute the originality value of the examined document. The originality value of a text is computed by computing the distance between each sentence in the text and the closest sentence in the suspected files, if exists. The proposed system structure is based on a search engine in order to reduce the cost of pairwise similarity. For the indexing process, we use the winnowing n-gram fingerprinting algorithm to reduce the index size. The fingerprints of each sentence are its n-grams that are represented by hash codes. The winnowing algorithm computes fingerprints for each sentence. As a result, the search time is improved and the detection process is accurate and robust. The experimental results showed superb performance of Iqtebas 1.0 as it achieved a recall value of 94% and a precision of 99%.Moreover, a comparison that is carried out between Iqtebas and the well known plagiarism detection system, SafeAssign, confirmed the high performance of Iqtebas.