Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods

Authors:
Salha M. Alzahrani;Naomie Salim;Ajith Abraham
Affiliations:
Faculty of Computer Science and Information Systems, Taif University, Alhawiah, Saudi Arabia;Faculty of Computer Science and Information Systems, University of Technology Malaysia, Skudai, Malaysia;VSB-Technical University of Ostrava, Czech Republic
Venue:
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Year:
2012

Citing 0
Cited 5

An improved plagiarism detection scheme based on semantic role labeling

Applied Soft Computing
Using structural information and citation evidence to detect significant plagiarism cases in scientific publications

Journal of the American Society for Information Science and Technology
Analysis and extraction of sentence-level paraphrase sub-corpus in CS education

Proceedings of the 13th annual conference on Information technology education
Online plagiarism detection through exploiting lexical, syntactic, and semantic information

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Increasing recall for text re-use in historical documents to support research in the humanities

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Plagiarism can be of many different natures, ranging from copying texts to adopting ideas, without giving credit to its originator. This paper presents a new taxonomy of plagiarism that highlights differences between literal plagiarism and intelligent plagiarism, from the plagiarist's behavioral point of view. The taxonomy supports deep understanding of different linguistic patterns in committing plagiarism, for example, changing texts into semantically equivalent but with different words and organization, shortening texts with concept generalization and specification, and adopting ideas and important contributions of others. Different textual features that characterize different plagiarism types are discussed. Systematic frameworks and methods of monolingual, extrinsic, intrinsic, and cross-lingual plagiarism detection are surveyed and correlated with plagiarism types, which are listed in the taxonomy. We conduct extensive study of state-of-the-art techniques for plagiarism detection, including character n-gram-based (CNG), vector-based (VEC), syntax-based (SYN), semantic-based (SEM), fuzzy-based (FUZZY), structural-based (STRUC), stylometric-based (STYLE), and cross-lingual techniques (CROSS). Our study corroborates that existing systems for plagiarism detection focus on copying text but fail to detect intelligent plagiarism when ideas are presented in different words.