Usage of multiple prediction models based on defect categories

  • Authors:
  • Bora Caglayan;Ayse Tosun;Andriy Miranskyy;Ayse Bener;Nuzio Ruffolo

  • Affiliations:
  • Boǧaziçi University, Istanbul, Turkey;Boǧaziçi University, Istanbul, Turkey;IBM Canada Ltd., Toronto, Canada;Ryerson University, Toronto, Canada;IBM Canada Ltd., Toronto, Canada

  • Venue:
  • Proceedings of the 6th International Conference on Predictive Models in Software Engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background: Most of the defect prediction models are built for two purposes: 1) to detect defective and defect-free modules (binary classification), and 2) to estimate the number of defects (regression analysis). It would also be useful to give more information on the nature of defects so that software managers can plan their testing resources more effectively. Aims: In this paper, we propose a defect prediction model that is based on defect categories. Method: We mined the version history of a large-scale enterprise software product to extract churn and static code metrics. and grouped them into three defect categories according to different testing phases. We built a learning-based model for each defect category. We compared the performance of our proposed model with a general one. We conducted statistical techniques to evaluate the relationship between defect categories and software metrics. We also tested our hypothesis by replicating the empirical work on Eclipse data. Results: Our results show that building models that are sensitive to defect categories is cost-effective in the sense that it reveals more information and increases detection rates (pd) by 10% keeping the false alarms (pf) constant. Conclusions: We conclude that slicing defect data and categorizing it for use in a defect prediction model would enable practitioners to take immediate actions. Our results on Eclipse replication showed that haphazard categorization of defects is not worth the effort.