Usage of multiple prediction models based on defect categories

Authors:
Bora Caglayan;Ayse Tosun;Andriy Miranskyy;Ayse Bener;Nuzio Ruffolo
Affiliations:
Boǧaziçi University, Istanbul, Turkey;Boǧaziçi University, Istanbul, Turkey;IBM Canada Ltd., Toronto, Canada;Ryerson University, Toronto, Canada;IBM Canada Ltd., Toronto, Canada
Venue:
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Year:
2010

Citing 18
Cited 2

Classification and evaluation of defects in a project retrospective

Journal of Systems and Software
The Art of Software Testing

The Art of Software Testing
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Discriminative pattern mining in software fault detection

Proceedings of the 3rd international workshop on Software quality assurance
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
A Multivariate Analysis of Static Code Attributes for Defect Prediction

QSIC '07 Proceedings of the Seventh International Conference on Quality Software
Extraction of bug localization benchmarks from history

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Using In-Process Testing Metrics to Estimate Post-Release Field Quality

ISSRE '07 Proceedings of the The 18th IEEE International Symposium on Software Reliability
An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules

IEEE Transactions on Software Engineering
Validation of network measures as indicators of defective modules in software systems

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Merits of using repository metrics in defect prediction for open source projects

FLOSS '09 Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
The Theory of Relative Dependency: Higher Coupling Concentration in Smaller Modules

IEEE Software
EQ-mine: predicting short-term defects for software evolution

FASE'07 Proceedings of the 10th international conference on Fundamental approaches to software engineering

Different strokes for different folks: a case study on software metrics for different defect categories

Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
Factors characterizing reopened issues: a case study

Proceedings of the 8th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: Most of the defect prediction models are built for two purposes: 1) to detect defective and defect-free modules (binary classification), and 2) to estimate the number of defects (regression analysis). It would also be useful to give more information on the nature of defects so that software managers can plan their testing resources more effectively. Aims: In this paper, we propose a defect prediction model that is based on defect categories. Method: We mined the version history of a large-scale enterprise software product to extract churn and static code metrics. and grouped them into three defect categories according to different testing phases. We built a learning-based model for each defect category. We compared the performance of our proposed model with a general one. We conducted statistical techniques to evaluate the relationship between defect categories and software metrics. We also tested our hypothesis by replicating the empirical work on Eclipse data. Results: Our results show that building models that are sensitive to defect categories is cost-effective in the sense that it reveals more information and increases detection rates (pd) by 10% keeping the false alarms (pf) constant. Conclusions: We conclude that slicing defect data and categorizing it for use in a defect prediction model would enable practitioners to take immediate actions. Our results on Eclipse replication showed that haphazard categorization of defects is not worth the effort.