C4.5: programs for machine learning
C4.5: programs for machine learning
Querying across languages: a dictionary-based approach to multilingual information retrieval
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
An algorithm for suffix stripping
Readings in information retrieval
Cross-Language Information Retrieval
Cross-Language Information Retrieval
Student Cheating and Plagiarism to the Internet Era: A Wake-up Call
Student Cheating and Plagiarism to the Internet Era: A Wake-up Call
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Wikipedia in the pocket: indexing technology for near-duplicate detection and high similarity search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Multilingual Plagiarism Detection
AIMSA '08 Proceedings of the 13th international conference on Artificial Intelligence: Methodology, Systems, and Applications
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
On Automatic Plagiarism Detection Based on n-Grams Comparison
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
CLEF 2009 ad hoc track overview: TEL and Persian tasks
CLEF'09 Proceedings of the 10th cross-language evaluation forum conference on Multilingual information access evaluation: text retrieval experiments
Cross-language plagiarism detection
Language Resources and Evaluation
Terrier information retrieval platform
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Authorship attribution of texts: a review
General Theory of Information Transfer and Combinatorics
A test collection to evaluate plagiarism by missing or incorrect references
CLEF'12 Proceedings of the Third international conference on Information Access Evaluation: multilinguality, multimodality, and visual analytics
Hi-index | 0.00 |
This paper presents a new method for Cross-Language Plagiarism Analysis. Our task is to detect the plagiarized passages in the suspicious documents and their corresponding fragments in the source documents. We propose a plagiarism detection method composed by five main phases: language normalization, retrieval of candidate documents, classifier training, plagiarism analysis, and post-processing. To evaluate our method, we created a corpus containing artificial plagiarism offenses. Two different experiments were conducted; the first one considers only monolingual plagiarism cases, while the second one considers only cross-language plagiarism cases. The results showed that the cross-language experiment achieved 86% of the performance of the monolingual baseline. We also analyzed how the plagiarized text length affects the overall performance of the method. This analysis showed that our method achieved better results with medium and large plagiarized passages.