The impact of text pre-processing to determine the similarity in students assignments

  • Authors:
  • Daniela Chudá;Ján Chlpek;Andrej Kumor

  • Affiliations:
  • Slovak University of Technology in Bratislava, Slovak Republic;Slovak University of Technology in Bratislava, Slovak Republic;Slovak University of Technology in Bratislava, Slovak Republic

  • Venue:
  • Proceedings of the 12th International Conference on Computer Systems and Technologies
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The aim of this paper is to appraise of the problems of plagiarism in students assignments. We focus on pre-processing techniques of Slovak texts assignments such as removing stop words, replacing synonyms, lemmatization, using of readability index. The main goal of this paper is find out if we can identify original student assignment and plagiarism of original student assignment based on their readability. Based on the result of further experimentation, we find which combinations of pre-processing techniques and methods for determining the similarity of students assignments are the most suitable, if we want to detect similarity as exactly as possible and for particular techniques to find out the extent in detection of categorised types of plagiarism.