An empirical investigation into a large-scale Java open source code repository

  • Authors:
  • Mark Grechanik;Collin McMillan;Luca DeFerrari;Marco Comi;Stefano Crespi;Denys Poshyvanyk;Chen Fu;Qing Xie;Carlo Ghezzi

  • Affiliations:
  • Accenture Technology Labs, Chicago, IL;The College of William and Mary, Williamsburg, VA;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy;Politecnico di Milano, Milano, Italy;The College of William and Mary;Accenture Technology Labs, Chicago, IL;Accenture Technology Labs, Chicago, IL;Politecnico di Milano, Milano, Italy

  • Venue:
  • Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Getting insight into different aspects of source code artifacts is increasingly important -- yet there is little empirical research using large bodies of source code, and subsequently there are not much statistically significant evidence of common patterns and facts of how programmers write source code. We pose 32 research questions, explain rationale behind them, and obtain facts from 2,080 randomly chosen Java applications from Sourceforge. Among these facts we find that most methods have one or zero arguments or they do not return any values, few methods are overridden, most inheritance hierarchies have the depth of one, close to 50% of classes are not explicitly inherited from any classes, and the number of methods is strongly correlated with the number of fields in a class.