Ecological inference in empirical software engineering

  • Authors:
  • Daryl Posnett;Vladimir Filkov;Premkumar Devanbu

  • Affiliations:
  • Department of Computer Science, University of California, Davis, USA;Department of Computer Science, University of California, Davis, USA;Department of Computer Science, University of California, Davis, USA

  • Venue:
  • ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software systems are decomposed hierarchically, for example, into modules, packages and files. This hierarchical decomposition has a profound influence on evolvability, maintainability and work assignment. Hierarchical decomposition is thus clearly of central concern for empirical software engineering researchers; but it also poses a quandary. At what level do we study phenomena, such as quality, distribution, collaboration and productivity? At the level of files? packages? or modules? How does the level of study affect the truth, meaning, and relevance of the findings? In other fields it has been found that choosing the wrong level might lead to misleading or fallacious results. Choosing a proper level, for study, is thus vitally important for empirical software engineering research; but this issue hasn't thus far been explicitly investigated. We describe the related idea of ecological inference and ecological fallacy from sociology and epidemiology, and explore its relevance to empirical software engineering; we also present some case studies, using defect and process data from 18 open source projects to illustrate the risks of modeling at an aggregation level in the context of defect prediction, as well as in hypothesis testing.