Sampling program quality

  • Authors:
  • Hongyu Zhang; Rongxin Wu

  • Affiliations:
  • School of Software, Tsinghua University, Beijing 100084, China;School of Software, Tsinghua University, Beijing 100084, China

  • Venue:
  • ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many modern software systems are large, consisting of hundreds or even thousands of programs (source files). Understanding the overall quality of these programs is a resource and time-consuming activity. It is desirable to have a quick yet accurate estimation of the overall program quality in a cost-effective manner. In this paper, we propose a sampling based approach - for a large software project, we only sample a small percentage of source files, and then estimate the quality of the entire programs in the project based on the characteristics of the sample. Through experiments on public defect datasets, we show that we can successfully estimate the total number of defects, proportions of defective programs, defect distributions, and defect-proneness - all from a small sample of programs. Our experiments also show that small samples can achieve similar prediction accuracies as larger samples do.