A new approach for cross-language plagiarism analysis

Authors:
Rafael Corezola Pereira;Viviane P. Moreira;Renata Galante
Affiliations:
Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil;Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil;Instituto de Informática, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
Venue:
CLEF'10 Proceedings of the 2010 international conference on Multilingual and multimodal information access evaluation: cross-language evaluation forum
Year:
2010

Citing 14
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Querying across languages: a dictionary-based approach to multilingual information retrieval

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
An algorithm for suffix stripping

Readings in information retrieval
Cross-Language Information Retrieval

Cross-Language Information Retrieval
Student Cheating and Plagiarism to the Internet Era: A Wake-up Call

Student Cheating and Plagiarism to the Internet Era: A Wake-up Call
Authorship verification as a one-class classification problem

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Multilingual Plagiarism Detection

AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
On Automatic Plagiarism Detection Based on n-Grams Comparison

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
CLEF 2009 ad hoc track overview: TEL and Persian tasks

CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Cross-language plagiarism detection

Language Resources and Evaluation
Terrier information retrieval platform

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Authorship attribution of texts: a review

General Theory of Information Transfer and Combinatorics

A test collection to evaluate plagiarism by missing or incorrect references

CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new method for Cross-Language Plagiarism Analysis. Our task is to detect the plagiarized passages in the suspicious documents and their corresponding fragments in the source documents. We propose a plagiarism detection method composed by five main phases: language normalization, retrieval of candidate documents, classifier training, plagiarism analysis, and post-processing. To evaluate our method, we created a corpus containing artificial plagiarism offenses. Two different experiments were conducted; the first one considers only monolingual plagiarism cases, while the second one considers only cross-language plagiarism cases. The results showed that the cross-language experiment achieved 86% of the performance of the monolingual baseline. We also analyzed how the plagiarized text length affects the overall performance of the method. This analysis showed that our method achieved better results with medium and large plagiarized passages.