Tracking concept drift of software projects using defect prediction quality

Authors:
Jayalath Ekanayake;Jonas Tappolet;Harald C. Gall;Abraham Bernstein
Affiliations:
Dynamic and Distributed Systems Group, Department of Informatics, University of Zurich, Switzerland;Dynamic and Distributed Systems Group, Department of Informatics, University of Zurich, Switzerland;Software Evolution and Architecture Lab, Department of Informatics, University of Zurich, Switzerland;Dynamic and Distributed Systems Group, Department of Informatics, University of Zurich, Switzerland
Venue:
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Year:
2009

Citing 0
Cited 6

Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Characterizing the roles of classes and their fault-proneness through change metrics

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Recalling the "imprecision" of cross-project defect prediction

Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
How, and why, process metrics are better

Proceedings of the 2013 International Conference on Software Engineering
The MSR cookbook: mining a decade of research

Proceedings of the 10th Working Conference on Mining Software Repositories
Software defect prediction using Bayesian networks

Empirical Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Defect prediction is an important task in the mining of software repositories, but the quality of predictions varies strongly within and across software projects. In this paper we investigate the reasons why the prediction quality is so fluctuating due to the altering nature of the bug (or defect) fixing process. Therefore, we adopt the notion of a concept drift, which denotes that the defect prediction model has become unsuitable as set of influencing features has changed - usually due to a change in the underlying bug generation process (i.e., the concept). We explore four open source projects (Eclipse, OpenOffice, Netbeans and Mozilla) and construct file-level and project-level features for each of them from their respective CVS and Bugzilla repositories. We then use this data to build defect prediction models and visualize the prediction quality along the time axis. These visualizations allow us to identify concept drifts and - as a consequence - phases of stability and instability expressed in the level of defect prediction quality. Further, we identify those project features, which are influencing the defect prediction quality using both a tree induction-algorithm and a linear regression model. Our experiments uncover that software systems are subject to considerable concept drifts in their evolution history. Specifically, we observe that the change in number of authors editing a file and the number of defects fixed by them contribute to a project's concept drift and therefore influence the defect prediction quality. Our findings suggest that project managers using defect prediction models for decision making should be aware of the actual phase of stability or instability due to a potential concept drift.