Extraction of Java program fingerprints for software authorship identification

  • Authors:
  • Haibiao Ding;Mansur H. Samadzadeh

  • Affiliations:
  • Department of Computer Science, Oklahoma State University, 218 MSCS, Stillwater, OK;Department of Computer Science, Oklahoma State University, 218 MSCS, Stillwater, OK

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computer programs belong to the authors who design, write, and test them. Authorship identification is concerned with determining the likelihood of a particular author having written some piece(s) of code. usually based on other code samples from the same programmer. Java is a popular object-oriented computer programming language. Programming fingerprints attempt to characterize the features that are unique to each programmer. In this study, we investigated the extraction of a set of software metrics of a given Java source code--by a program written in Visual C++ that could be used as a fingerprint to identify the author of the Java code. The contributions of the selected metrics to authorship identification were measured by a statistical process, namely canonical discriminant analysis, using the statistical software package SAS. Out of the 56 extracted metrics, 48 metrics were identified as being contributive to authorship identification. The authorship of 62.6-67.2% of the Java programs considered could be correctly identified with the extracted metrics. The identification rate could be as high as 85.8%, with derived canonical variates. Moreover. layout metrics played a more important role in authorship identification than the other metrics.