Defect prediction from static code features: current results, limitations, new approaches

Authors:
Tim Menzies;Zach Milton;Burak Turhan;Bojan Cukic;Yue Jiang;Ayşe Bener
Affiliations:
West Virginia University, Morgantown, USA;West Virginia University, Morgantown, USA;University of Oulu, Oulu, Finland;West Virginia University, Morgantown, USA;West Virginia University, Morgantown, USA;Boğaziçi University, Istandbul, Turkey
Venue:
Automated Software Engineering
Year:
2010

Citing 55
Cited 24

Advances in software inspections

IEEE Transactions on Software Engineering
Software reliability: measurement, prediction, application

Software reliability: measurement, prediction, application
Skip lists: a probabilistic alternative to balanced trees

Communications of the ACM
C4.5: programs for machine learning

C4.5: programs for machine learning
Science and Substance: A Challenge to Software Engineers

IEEE Software
Safeware: system safety and computers

Safeware: system safety and computers
Machine Learning Approaches to Estimating Software Development Effort

IEEE Transactions on Software Engineering
Empirical methods for artificial intelligence

Empirical methods for artificial intelligence
Bagging predictors

Machine Learning
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
A decision-theoretic generalization of on-line learning and an application to boosting

Journal of Computer and System Sciences - Special issue: 26th annual ACM symposium on the theory of computing & STOC'94, May 23–25, 1994, and second annual Europe an conference on computational learning theory (EuroCOLT'95), March 13–15, 1995
Some Conservative Stopping Rules for the Operational Testing of Safety-Critical Software

IEEE Transactions on Software Engineering
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
A Critique of Software Defect Prediction Models

IEEE Transactions on Software Engineering
Software evolution: code delta and code churn

Journal of Systems and Software - Special issue on software maintenance
Software Verification and Validation for Practitioners and Managers, Second Edition

Software Verification and Validation for Practitioners and Managers, Second Edition
Elements of Software Science (Operating and programming systems series)

Elements of Software Science (Operating and programming systems series)
Random Forests

Machine Learning
How Perspective-Based Reading Can Improve Requirements Inspections

Computer
Empirically Guided Software Development Using Metric-Based Classification Trees

IEEE Software
Software Testability: The New Verification

IEEE Software
When to Test Less

IEEE Software
Complexity Measure Evaluation and Selection

IEEE Transactions on Software Engineering
Using Rule Sets to Maximize ROC Performance

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Towards a Theory for Integration of Mathematical Verification and Empirical Testing

ASE '98 Proceedings of the 13th IEEE international conference on Automated software engineering
Model-Based Tests of Truisms

Proceedings of the 17th IEEE international conference on Automated software engineering
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Empirical Software Engineering
What We Have Learned About Fighting Defects

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
An Application of Zero-Inflated Poisson Regression for Software Fault Prediction

ISSRE '01 Proceedings of the 12th International Symposium on Software Reliability Engineering
Operational anomalies as a cause of safety-critical requirements evolution

Journal of Systems and Software
Developing Fault Predictors for Evolving Software Systems

METRICS '03 Proceedings of the 9th International Symposium on Software Metrics
Where the bugs are

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Noise Identification with the k-Means Algorithm

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
Using AUC and Accuracy in Evaluating Learning Algorithms

IEEE Transactions on Knowledge and Data Engineering
Predictors of customer perceived software quality

Proceedings of the 27th international conference on Software engineering
Static analysis tools as early indicators of pre-release defect density

Proceedings of the 27th international conference on Software engineering
Predicting fault-prone components in a java legacy system

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Data Mining

Data Mining
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Modeling the Effect of Size on Defect Proneness for Open-Source Software

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
A Complexity Measure

IEEE Transactions on Software Engineering
The influence of organizational structure on software quality: an empirical case study

Proceedings of the 30th international conference on Software engineering
Theory of relative defect proneness

Empirical Software Engineering
Techniques for evaluating fault prediction models

Empirical Software Engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules

IEEE Transactions on Software Engineering
Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Design and code inspections to reduce errors in program development

IBM Systems Journal
How good is your blind spot sampling policy

HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering
To select or to weigh: a comparative study of model selection and model weighing for SPODE ensembles

ECML'06 Proceedings of the 17th European conference on Machine Learning

Replication of defect prediction studies: problems, pitfalls and recommendations

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Better, faster, and cheaper: what is better software?

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
On the value of learning from defect dense components for software defect prediction

Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Empirical evaluation of reliability improvement in an evolving software product line

Proceedings of the 8th Working Conference on Mining Software Repositories
Do time of day and developer experience affect commit bugginess?

Proceedings of the 8th Working Conference on Mining Software Repositories
Are change metrics good predictors for an evolving software product line?

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
The inductive software engineering manifesto: principles for industrial data mining

Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering
Guest editorial: learning to organize testing

Automated Software Engineering
An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
Ecological inference in empirical software engineering

ASE '11 Proceedings of the 2011 26th IEEE/ACM International Conference on Automated Software Engineering
Coding-error based defects in enterprise resource planning software: Prevention, discovery, elimination and mitigation

Journal of Systems and Software
Evaluating defect prediction approaches: a benchmark and an extensive comparison

Empirical Software Engineering
Bug prediction based on fine-grained module histories

Proceedings of the 34th International Conference on Software Engineering
Mining input sanitization patterns for predicting SQL injection and cross site scripting vulnerabilities

Proceedings of the 34th International Conference on Software Engineering
Characterizing the roles of classes and their fault-proneness through change metrics

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Method-level bug prediction

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Mining SQL injection and cross site scripting vulnerabilities using hybrid program analysis

Proceedings of the 2013 International Conference on Software Engineering
The impact of tangled code changes

Proceedings of the 10th Working Conference on Mining Software Repositories
A cost-effectiveness criterion for applying software defect prediction models

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Beyond data mining; towards "idea engineering"

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Predicting SQL injection and cross site scripting vulnerabilities through mining input sanitization patterns

Information and Software Technology
A study of subgroup discovery approaches for defect prediction

Information and Software Technology
Is lines of code a good measure of effort in effort-aware models?

Information and Software Technology
An in-depth study of the potentially confounding effect of class size in fault prediction

ACM Transactions on Software Engineering and Methodology (TOSEM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Building quality software is expensive and software quality assurance (QA) budgets are limited. Data miners can learn defect predictors from static code features which can be used to control QA resources; e.g. to focus on the parts of the code predicted to be more defective.Recent results show that better data mining technology is not leading to better defect predictors. We hypothesize that we have reached the limits of the standard learning goal of maximizing area under the curve (AUC) of the probability of false alarms and probability of detection "AUC(pd, pf)"; i.e. the area under the curve of a probability of false alarm versus probability of detection.Accordingly, we explore changing the standard goal. Learners that maximize "AUC(effort, pd)" find the smallest set of modules that contain the most errors. WHICH is a meta-learner framework that can be quickly customized to different goals. When customized to AUC(effort, pd), WHICH out-performs all the data mining methods studied here. More importantly, measured in terms of this new goal, certain widely used learners perform much worse than simple manual methods.Hence, we advise against the indiscriminate use of learners. Learners must be chosen and customized to the goal at hand. With the right architecture (e.g. WHICH), tuning a learner to specific local business goals can be a simple task.