A new approach for cross-language plagiarism analysis

  • Authors:
  • Rafael Corezola Pereira;Viviane P. Moreira;Renata Galante

  • Affiliations:
  • Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil;Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil;Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil

  • Venue:
  • CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new method for Cross-Language Plagiarism Analysis. Our task is to detect the plagiarized passages in the suspicious documents and their corresponding fragments in the source documents. We propose a plagiarism detection method composed by five main phases: language normalization, retrieval of candidate documents, classifier training, plagiarism analysis, and post-processing. To evaluate our method, we created a corpus containing artificial plagiarism offenses. Two different experiments were conducted; the first one considers only monolingual plagiarism cases, while the second one considers only cross-language plagiarism cases. The results showed that the cross-language experiment achieved 86% of the performance of the monolingual baseline. We also analyzed how the plagiarized text length affects the overall performance of the method. This analysis showed that our method achieved better results with medium and large plagiarized passages.