A Replicated Comparative Study of Source Code Authorship Attribution

  • Authors:
  • Matthew F. Tennyson

  • Affiliations:
  • -

  • Venue:
  • RESER '13 Proceedings of the 2013 3rd International Workshop on Replication in Empirical Software Engineering Research
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Source code authorship attribution is, simply, the task of deciding who wrote a piece of software given its source code. Applications include software forensics, plagiarism detection, and determining software ownership. Several methods of source code authorship attribution have been proposed in the past. Based on the only known controlled, comprehensive comparative study of these methods, the two most effective methods are the Burrows method and the SCAP method. This paper presents a partial replication of that comparative study. Specifically, it only compares the two most effective methods (Burrows and SCAP). This paper also includes a slight extension of that study: the original comparative study only considered anonymized data, while the replicated study considers both anonymized and non-anonymized data. The original comparative study indicated that the Burrows method outperformed all other methods - including the SCAP method - by a considerable margin. However, the results of the replicated study indicate that the SCAP method outperforms the Burrows method by a small margin when using anonymized data and by a large margin when using non-anonymized data.