Time variance and defect prediction in software projects

Authors:
Jayalath Ekanayake;Jonas Tappolet;Harald C. Gall;Abraham Bernstein
Affiliations:
Dynamic and Distributed Information Systems, Institute of Informatics, University of Zurich, Zurich, Switzerland;Dynamic and Distributed Information Systems, Institute of Informatics, University of Zurich, Zurich, Switzerland;Software Evolution and Architecture Lab, Institute of Informatics, University of Zurich, Zurich, Switzerland;Dynamic and Distributed Information Systems, Institute of Informatics, University of Zurich, Zurich, Switzerland
Venue:
Empirical Software Engineering
Year:
2012

Citing 27
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
The mythical man-month (anniversary ed.)

The mythical man-month (anniversary ed.)
A Critique of Software Defect Prediction Models

IEEE Transactions on Software Engineering
Predicting Fault Incidence Using Software Change History

IEEE Transactions on Software Engineering
Robust Classification for Imprecise Environments

Machine Learning
Effective Learning in Dynamic Environments by Explicit Context Tracking

ECML '93 Proceedings of the European Conference on Machine Learning
Identifying Reasons for Software Changes Using Historic Databases

ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Detection of software modules with high debug code churn in a very large legacy system

ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
Static analysis tools as early indicators of pre-release defect density

Proceedings of the 27th international conference on Software engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
The Top Ten List: Dynamic Fault Prediction

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Forecasting Field Defect Rates Using a Combined Time-Based and Metrics-Based Approach: A Case Study of OpenBSD

ISSRE '05 Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering
Predicting defect densities in source code files with decision tree learners

Proceedings of the 2006 international workshop on Mining software repositories
Entropy-based Concept Shift Detection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Predicting Faults from Cached History

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Improving defect prediction using temporal features and non linear models

Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
A survey and taxonomy of approaches for mining software repositories in the context of software evolution

Journal of Software Maintenance and Evolution: Research and Practice
Do Crosscutting Concerns Cause Defects?

IEEE Transactions on Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
Is it a bug or an enhancement?: a text-based approach to classify change requests

CASCON '08 Proceedings of the 2008 conference of the center for advanced studies on collaborative research: meeting of minds
Trend Analysis and Issue Prediction in Large-Scale Open Source Systems

CSMR '08 Proceedings of the 2008 12th European Conference on Software Maintenance and Reengineering
Guest editors introduction: special issue on mining software repositories

Empirical Software Engineering
Predicting faults using the complexity of code changes

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Fair and balanced?: bias in bug-fix datasets

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
How power users help and hinder open bug reporting

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

How, and why, process metrics are better

Proceedings of the 2013 International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is crucial for a software manager to know whether or not one can rely on a bug prediction model. A wrong prediction of the number or the location of future bugs can lead to problems in the achievement of a project's goals. In this paper we first verify the existence of variability in a bug prediction model's accuracy over time both visually and statistically. Furthermore, we explore the reasons for such a high variability over time, which includes periods of stability and variability of prediction quality, and formulate a decision procedure for evaluating prediction models before applying them. To exemplify our findings we use data from four open source projects and empirically identify various project features that influence the defect prediction quality. Specifically, we observed that a change in the number of authors editing a file and the number of defects fixed by them influence the prediction quality. Finally, we introduce an approach to estimate the accuracy of prediction models that helps a project manager decide when to rely on a prediction model. Our findings suggest that one should be aware of the periods of stability and variability of prediction quality and should use approaches such as ours to assess their models' accuracy in advance.