Research on intrinsic plagiarism detection resolution: a supervised learning approach

  • Authors:
  • Xiuli Hua;Shoushan Li;Peifeng Li;Qiaoming Zhu

  • Affiliations:
  • Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China;Natural Language Processing Lab, Soochow University, Suzhou, Jiangsu, China, School of Computer Science & Technology, Soochow University, Suzhou, Jiangsu, China

  • Venue:
  • CLSW'12 Proceedings of the 13th Chinese conference on Chinese Lexical Semantics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing researches on text plagiarism detection mainly focus on external plagiarism detection which assumes a reference collection is given and the plagiarism detection task aims to compare suspicious documents against this collection to find the plagiarism articles with high similarity. The results of existing studies have performed well in identifying external plagiarized sections. However, in the real world, the reference collection is impossible to get. This paper focuses on this case and proposes an intrinsic plagiarism detection framework with supervised machine learning approach. The instance creation and the feature selection method are presented in detail. The experimental results on PAN'09 corpus demonstrate the effectiveness of our approach to intrinsic plagiarism.