An empirical investigation into a large-scale Java open source code repository

Authors:
Mark Grechanik;Collin McMillan;Luca DeFerrari;Marco Comi;Stefano Crespi;Denys Poshyvanyk;Chen Fu;Qing Xie;Carlo Ghezzi
Affiliations:
Accenture Technology Labs, Chicago, IL;The College of William and Mary, Williamsburg, VA;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy;The College of William and Mary;Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Politecnico di Milano, Milano, Italy
Venue:
Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
Year:
2010

Citing 16
Cited 10

Inheritance is not subtyping

POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fundamentals of software engineering

Fundamentals of software engineering
A Discipline of Programming

A Discipline of Programming
Two case studies of open source software development: Apache and Mozilla

ACM Transactions on Software Engineering and Methodology (TOSEM)
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
Preliminary guidelines for empirical research in software engineering

IEEE Transactions on Software Engineering
An Empirical Exploration of the Distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite

Empirical Software Engineering
Understanding the shape of Java software

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Properties of Signature Change Patterns

ICSM '06 Proceedings of the 22nd IEEE International Conference on Software Maintenance
The Future of Empirical Methods in Software Engineering Research

FOSE '07 2007 Future of Software Engineering
Mining Software Repositories with iSPAROL and a Software Evolution Ontology

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Exploring the effects of SourceForge.net coordination and communication tools on the efficiency of open source projects using data envelopment analysis

Empirical Software Engineering
Amassing and indexing a large sample of version control systems: Towards the census of public source code history

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
MapReduce as a general framework to support research in Mining Software Repositories (MSR)

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Assessing the efficacy of user and developer activities in facilitating the development of OSS projects

Journal of Software Maintenance and Evolution: Research and Practice

How developers use the dynamic features of programming languages: the case of smalltalk

Proceedings of the 8th Working Conference on Mining Software Repositories
Searching, selecting, and synthesizing source code

Proceedings of the 33rd International Conference on Software Engineering
Testing software in age of data privacy: a balancing act

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Are Java programmers transitioning to multicore?: a large scale study of java FLOSS

Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Evaluating program analysis and testing tools with the RUGRAT random benchmark application generator

Proceedings of the 2012 Workshop on Dynamic Analysis
Detecting similar software applications

Proceedings of the 34th International Conference on Software Engineering
Extensions during software evolution: do objects meet their promise?

ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
How do developers use parallel libraries?

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Declarative visitors to ease fine-grained source code mining with full history on billions of AST nodes

Proceedings of the 12th international conference on Generative programming: concepts & experiences
How (and why) developers use the dynamic features of programming languages: the case of smalltalk

Empirical Software Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Getting insight into different aspects of source code artifacts is increasingly important -- yet there is little empirical research using large bodies of source code, and subsequently there are not much statistically significant evidence of common patterns and facts of how programmers write source code. We pose 32 research questions, explain rationale behind them, and obtain facts from 2,080 randomly chosen Java applications from Sourceforge. Among these facts we find that most methods have one or zero arguments or they do not return any values, few methods are overridden, most inheritance hierarchies have the depth of one, close to 50% of classes are not explicitly inherited from any classes, and the number of methods is strongly correlated with the number of fields in a class.