Information fusion for multidocument summarization: paraphrasing and generation
Information fusion for multidocument summarization: paraphrasing and generation
Information fusion in the context of multi-document summarization
ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Extracting paraphrases from a parallel corpus
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Constructing corpora for the development and evaluation of paraphrase systems
Computational Linguistics
Learning paraphrases from text
Learning paraphrases from text
Plagiarism detection across distant language pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
An evaluation framework for plagiarism detection
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Computational Linguistics
Language Resources and Evaluation
Cross-language plagiarism detection
Language Resources and Evaluation
Who's the thief? automatic detection of the direction of plagiarism
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Paraphrase acquisition via crowdsourcing and machine learning
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Hi-index | 0.00 |
Although paraphrasing is the linguistic mechanism underlying many plagiarism cases, little attention has been paid to its analysis in the framework of automatic plagiarism detection. Therefore, state-of-the-art plagiarism detectors find it difficult to detect cases of paraphrase plagiarism. In this article, we analyze the relationship between paraphrasing and plagiarism, paying special attention to which paraphrase phenomena underlie acts of plagiarism and which of them are detected by plagiarism detection systems. With this aim in mind, we created the P4P corpus, a new resource that uses a paraphrase typology to annotate a subset of the PAN-PC-10 corpus for automatic plagiarism detection. The results of the Second International Competition on Plagiarism Detection were analyzed in the light of this annotation. The presented experiments show that i more complex paraphrase phenomena and a high density of paraphrase mechanisms make plagiarism detection more difficult, ii lexical substitutions are the paraphrase mechanisms used the most when plagiarizing, and iii paraphrase mechanisms tend to shorten the plagiarized text. For the first time, the paraphrase mechanisms behind plagiarism have been analyzed, providing critical insights for the improvement of automatic plagiarism detection systems.