Analysis and extraction of sentence-level paraphrase sub-corpus in CS education

Authors:
Faisal Alvi;El-Sayed M. El-Alfy;Wasfi G. Al-Khatib;Radwan E. Abdel-Aal
Affiliations:
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia;King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia;King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia;King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
Venue:
Proceedings of the 13th annual conference on Information technology education
Year:
2012

Citing 5
Cited 0

Copy detection mechanisms for digital documents

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
An evaluation framework for plagiarism detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Developing a corpus of plagiarised short answers

Language Resources and Evaluation
Understanding Plagiarism Linguistic Patterns, Textual Features, and Detection Methods

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since the advent of the Internet, plagiarism has become a widespread problem in student submissions. Paraphrasing is one of the several types of plagiarism employed by students to mask the original source. In this work, we construct a sub-corpus of paraphrased sentences by extracting all lightly and heavily revised sentences from the Corpus of Plagiarized Short Answers, using modified criteria for sentences. We then apply document similarity measures on this sub-corpus and derive some interesting features of this sub-corpus. Our findings suggest that this sub-corpus is more suited for testing paraphrase detection techniques by providing sentence-level paraphrasing samples instead of the file-level classification provided in the original corpus. Additional sentence samples may also be added to this sub-corpus to achieve variety and scale.