Detection of simple plagiarism in computer science papers

Authors:
Yaakov HaCohen-Kerner;Aharon Tayeb;Natan Ben-Dror
Affiliations:
Jerusalem College of Technology (Machon Lev);Jerusalem College of Technology (Machon Lev);Jerusalem College of Technology (Machon Lev)
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Year:
2010

Citing 9
Cited 3

Building a scalable and accurate copy detection mechanism

Proceedings of the first ACM international conference on Digital libraries
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Comparison of Overlap Detection Techniques

ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Methods for identifying versioned and plagiarized documents

Journal of the American Society for Information Science and Technology
Self-plagiarism in computer science

Communications of the ACM - Transforming China
Plagiarism Detection in arXiv

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Unsupervised induction of modern standard Arabic verb classes using syntactic frames and LSA

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Unsupervised induction of modern standard Arabic verb classes

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

Text reuse with ACL: (upward) trends

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Experiments with filtered detection of similar academic papers

AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Determining and characterizing the reused text for plagiarism detection

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Plagiarism is the use of the language and thoughts of another work and the representation of them as one's own original work. Various levels of plagiarism exist in many domains in general and in academic papers in particular. Therefore, diverse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including papers that were randomly chosen from C. A widespread variety of baseline methods has been developed to identify identical or similar papers. Several methods are novel. The experimental results and their analysis show interesting findings. Some of the novel methods are among the best predictive methods.