The internet worm program: an analysis
ACM SIGCOMM Computer Communication Review
Programming style authorship analysis
CSC '89 Proceedings of the 17th conference on ACM Annual Computer Science Conference
An empirical study of COBOL programs via a style analyzer: the benefits of good programming style
Journal of Systems and Software - Special issue on software engineering education
Journal of Systems and Software
Beyond preliminary analysis of the WANK and OILZ worms: a case study of malicious code
Computers and Security
Software forensics: can we track code to its authors?
Computers and Security
Linguistic laws and computer programs
Journal of the American Society for Information Science
Computer and natural language texts—a comparison based on long-range correlations
Journal of the American Society for Information Science
Java Software Solutions: Foundations of Program Design with Cdrom
Java Software Solutions: Foundations of Program Design with Cdrom
IDENTIFIED: A Dictionary-Based System for Extracting Source Code Metrics for Software Forensics
SEEP '98 Proceedings of the 1998 International Conference on Software Engineering: Education & Practice
Augmenting Naive Bayes Classifiers with Statistical Language Models
Information Retrieval
Automatic text categorization in terms of genre and author
Computational Linguistics
Practical Common Lisp
Extraction of Java program fingerprints for software authorship identification
Journal of Systems and Software
Effective identification of source code authors using byte-level information
Proceedings of the 28th international conference on Software engineering
Authorship analysis in cybercrime investigation
ISI'03 Proceedings of the 1st NSF/NIJ conference on Intelligence and security informatics
Code analyzer for an online course management system
Journal of Systems and Software
Hi-index | 0.00 |
The use of Source Code Author Profiles (SCAP) represents a new, highly accurate approach to source code authorship identification that is, unlike previous methods, language independent. While accuracy is clearly a crucial requirement of any author identification method, in cases of litigation regarding authorship, plagiarism, and so on, there is also a need to know why it is claimed that a piece of code is written by a particular author. What is it about that piece of code that suggests a particular author? What features in the code make one author more likely than another? In this study, we describe a means of identifying the high-level features that contribute to source code authorship identification using as a tool the SCAP method. A variety of features are considered for Java and Common Lisp and the importance of each feature in determining authorship is measured through a sequence of experiments in which we remove one feature at a time. The results show that, for these programs, comments, layout features and package-related naming influence classification accuracy whereas user-defined naming, an obvious programmer related feature, does not appear to influence accuracy. A comparison is also made between the relative feature contributions in programs written in the two languages.