Developing a corpus of plagiarised short answers

Authors:
Paul Clough;Mark Stevenson
Affiliations:
Department of Information Studies, University of Sheffield, Sheffield, UK S1 4DP;Department of Computer Science, University of Sheffield, Sheffield, UK S1 4DP
Venue:
Language Resources and Evaluation
Year:
2011

Citing 25
Cited 7

Copy detection mechanisms for digital documents

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Building a scalable and accurate copy detection mechanism

Proceedings of the first ACM international conference on Digital libraries
Information storage and retrieval

Information storage and retrieval
Analyzing existing software for software reuse

Journal of Systems and Software
Sim: a utility for detecting similarity in computer programs

SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
The decomposition of human-written summary sentences

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
SPSS for Windows: An Introduction to Use and Interpretation in Research

SPSS for Windows: An Introduction to Use and Interpretation in Research
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
"Uni cheats racket": a case study in plagiarism investigation

ACE '04 Proceedings of the Sixth Australasian Conference on Computing Education - Volume 30
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Self-plagiarism in computer science

Communications of the ACM - Transforming China
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
METER: MEasuring TExt Reuse

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Sentence-based natural language plagiarism detection

Journal on Educational Resources in Computing (JERIC)
TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)

TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing)
Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Improved statistical machine translation using paraphrases

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Automatic generation of benchmarks for plagiarism detection tools using grammatical evolution

Proceedings of the 9th annual conference on Genetic and evolutionary computation
Authorship attribution

Foundations and Trends in Information Retrieval
Constructing corpora for the development and evaluation of paraphrase systems

Computational Linguistics
A statistical approach to crosslingual natural language tasks

Journal of Algorithms
Methods for Evaluating Interactive Information Retrieval Systems with Users

Foundations and Trends in Information Retrieval
Unsupervised anomaly detection

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Plagiarism in programming assignments

IEEE Transactions on Education

Plagiarism detection across distant language pairs

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Retrieving candidate plagiarised documents using query expansion

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Using structural information and citation evidence to detect significant plagiarism cases in scientific publications

Journal of the American Society for Information Science and Technology
On the application of spell correction to improve plagiarism detection

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Analysis and extraction of sentence-level paraphrase sub-corpus in CS education

Proceedings of the 13th annual conference on Information technology education
Determining and characterizing the reused text for plagiarism detection

Expert Systems with Applications: An International Journal
Editorial: Celebrating 50 years

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Plagiarism is widely acknowledged to be a significant and increasing problem for higher education institutions (McCabe 2005; Judge 2008). A wide range of solutions, including several commercial systems, have been proposed to assist the educator in the task of identifying plagiarised work, or even to detect them automatically. Direct comparison of these systems is made difficult by the problems in obtaining genuine examples of plagiarised student work. We describe our initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated. This corpus is designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarism detection systems.