Online plagiarism detection through exploiting lexical, syntactic, and semantic information

  • Authors:
  • Wan-Yu Lin;Nanyun Peng;Chun-Chao Yen;Shou-de Lin

  • Affiliations:
  • National Taiwan University;Peking University;National Taiwan University;National Taiwan University

  • Venue:
  • ACL '12 Proceedings of the ACL 2012 System Demonstrations
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce a framework that identifies online plagiarism by exploiting lexical, syntactic and semantic features that includes duplication-gram, reordering and alignment of words, POS and phrase tags, and semantic similarity of sentences. We establish an ensemble framework to combine the predictions of each model. Results demonstrate that our system can not only find considerable amount of real-world online plagiarism cases but also outperforms several state-of-the-art algorithms and commercial software.