Defect, defect, defect: defect prediction 2.0

Authors:
Sunghun Kim
Affiliations:
The Hong Kong University of Science and Technology
Venue:
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Year:
2012

Citing 16
Cited 0

A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Predicting Fault-Prone Software Modules in Telephone Switches

IEEE Transactions on Software Engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Predicting Faults from Cached History

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Using FindBugs on production software

Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion
A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Proceedings of the 30th international conference on Software engineering
Predicting defects using network analysis on dependency graphs

Proceedings of the 30th international conference on Software engineering
Classifying Software Changes: Clean or Buggy?

IEEE Transactions on Software Engineering
Predicting faults using the complexity of code changes

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Fair and balanced?: bias in bug-fix datasets

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Failure is a four-letter word: a parody in empirical research

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Micro interaction metrics for defect prediction

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Defect prediction has been a very active research area in software engineering [6--8, 11, 13, 16, 19, 20]. In 1971, Akiyama proposed one of the earliest defect prediction models using Lines of Code (LOC) [1]: "Defect = 4.86 + 0.018LOC." Since then, many effective new defect prediction models and metrics have been proposed. For the prediction models, typical machine learners and regression algorithms such as Naive Bayes, Decision Tree, and Linear Regression are widely used. On the other hand, Kim et al. proposed a cache-based prediction model using bug occurrence properties [9]. Hassan proposed a change entropy model to effectively predict defects [6]. Recently, Bettenburg et al. proposed Multivariate Adaptive Regression Splines to improve defect prediction models by learning from local and global properties together [4]. Besides LOC, many new effective metrics for defect prediction have been proposed. Among them, source code metrics and change history metrics are widely used and yield reasonable defect prediction accuracy. For example, Basili et al. [3] used Chidamber and Kemerer metrics, and Ohlsson et al. [14] used McCabe's cyclomatic complexity for defect prediction. Moser et al. [12] used the number of revisions, authors, and past fixes, and age of a file as defect predictors. Recently, micro interaction metrics (MIMs) [10] and source code quality measures [15] for effective defect prediction are proposed. However, there is much room to improve for defect prediction 2.0. First of all, understanding the actual causes of defects is necessary. Without understanding them, we may reach to nonsensical conclusions from defect prediction results [18]. Many effective prediction models have been proposed, but successful application cases in practice are scarcely reported. To be more attractive for developers in practice, it is desirable to predict defects in finer granularity levels such as the code line or even keyword level. Note that static bug finders such as FindBugs [2] can identify potential bugs in the line level, and many developers find them useful in practice. Dealing with noise in defect data has become an important issue. Bird et al. identified there is non-neglectable noise in defect data [5]. This noise may yield poor and/or meaningless defect prediction results. Cross-prediction is highly desirable: for new projects or projects with limited training data, it is necessary to learn a prediction model using sufficient training data from other projects, and to apply the model to those projects. However, Zimmermann et al. [21] identified cross-project prediction is a challenging problem. Turhan et al. [17] analyzed Cross-Company (CC) and Within-Company (WC) data for defect prediction, and confirmed that it is challenging to reuse CC data directly to predict defects in other companies' software. Overall, defect prediction is a very interesting and promising research area. However, there are still many research challenges and problems to be addressed. Hopefully, this discussion calls new solutions and ideas to address these challenges.