Web Based Cross Language Plagiarism Detection

  • Authors:
  • Chow Kok Kent;Naomie Salim

  • Affiliations:
  • -;-

  • Venue:
  • CIMSIM '10 Proceedings of the 2010 Second International Conference on Computational Intelligence, Modelling and Simulation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the Internet help us cross language and cultural border by providing different types of translation tools, cross language plagiarism, also known as translation plagiarism are bound to arise. In this paper, we propose a new approach in detecting cross language plagiarism. In order to limit certain scale of our proposed system, we are consider Bahasa Melayu as an input language of the submitted query document and English as a target language of similar, possibly plagiarised documents. Input documents are translated into English using Google Translate API before undergo pre-processing phase (stemming and removal of stop words). Tokenized documents are sent to the Google AJAX Search API to detect similar documents throughout the World Wide Web. Only top ten sources retrieved by the Google Search API are considered as the candidate of source documents. We integrate the use of Stanford Parser and WordNet to determine the similarity level between the suspected documents with those candidate source documents. After that, a detailed similarity analysis is performed and a report of results is produced.