Building a scalable and accurate copy detection mechanism
Proceedings of the first ACM international conference on Digital libraries
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Comparison of Overlap Detection Techniques
ICCS '02 Proceedings of the International Conference on Computational Science-Part I
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
Self-plagiarism in computer science
Communications of the ACM - Transforming China
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Finding similar files in a large file system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Unsupervised induction of modern standard Arabic verb classes using syntactic frames and LSA
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Unsupervised induction of modern standard Arabic verb classes
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Text reuse with ACL: (upward) trends
ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Experiments with filtered detection of similar academic papers
AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Determining and characterizing the reused text for plagiarism detection
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Plagiarism is the use of the language and thoughts of another work and the representation of them as one's own original work. Various levels of plagiarism exist in many domains in general and in academic papers in particular. Therefore, diverse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including papers that were randomly chosen from C. A widespread variety of baseline methods has been developed to identify identical or similar papers. Several methods are novel. The experimental results and their analysis show interesting findings. Some of the novel methods are among the best predictive methods.