Source code author identification with unsupervised feature learning

  • Authors:
  • Upul Bandara;Gamini Wijayarathna

  • Affiliations:
  • Department of Industrial Management, University of Kelaniya, Sri Lanka;Department of Industrial Management, University of Kelaniya, Sri Lanka

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2013

Quantified Score

Hi-index 0.10

Visualization

Abstract

Automatic identification of source code authors has many applications in different fields such as source code plagiarism detection, and law suit prosecution. This paper presents a new source code author identification system based on an unsupervised feature learning technique. As a method of extracting features from high dimensional data, unsupervised feature learning has obtained a great success in many fields such as character recognition and image classification. However, according to our knowledge it has not been applied for source code author identification systems. Therefore, we investigated an unsupervised feature learning technique called sparse auto-encoder as a method of extracting features from source code files. Our system was evaluated with several datasets and results have shown that performance is very close to the state of art techniques in the source code identification field.