ReLink: recovering links between bugs and changes
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Sample-based software defect prediction with active and semi-supervised learning
Automated Software Engineering
Predicting defect numbers based on defect state transition models
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Hi-index | 0.00 |
Many modern software systems are large, consisting of hundreds or even thousands of programs (source files). Understanding the overall quality of these programs is a resource and time-consuming activity. It is desirable to have a quick yet accurate estimation of the overall program quality in a cost-effective manner. In this paper, we propose a sampling based approach - for a large software project, we only sample a small percentage of source files, and then estimate the quality of the entire programs in the project based on the characteristics of the sample. Through experiments on public defect datasets, we show that we can successfully estimate the total number of defects, proportions of defective programs, defect distributions, and defect-proneness - all from a small sample of programs. Our experiments also show that small samples can achieve similar prediction accuracies as larger samples do.