High-impact defects: a study of breakage and surprise defects

Authors:
Emad Shihab;Audris Mockus;Yasutaka Kamei;Bram Adams;Ahmed E. Hassan
Affiliations:
Queen's Univeristy, Kingston, ON, Canada;Avaya Labs Research, Basking Ridge, NJ, USA;Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, ON, Canada
Venue:
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Year:
2011

Citing 30
Cited 7

An Analysis of Several Software Defect Models

IEEE Transactions on Software Engineering
A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Predicting Fault-Prone Software Modules in Telephone Switches

IEEE Transactions on Software Engineering
Predicting Fault Incidence Using Software Change History

IEEE Transactions on Software Engineering
Does Code Decay? Assessing the Evidence from Change Management Data

IEEE Transactions on Software Engineering
Classification and evaluation of defects in a project retrospective

Journal of Systems and Software
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects

IEEE Transactions on Software Engineering
A complexity measure

ICSE '76 Proceedings of the 2nd international conference on Software engineering
Detection of Logical Coupling Based on Product Release History

ICSM '98 Proceedings of the International Conference on Software Maintenance
Where the bugs are

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Use of relative code churn measures to predict system defect density

Proceedings of the 27th international conference on Software engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction

IEEE Transactions on Software Engineering
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Predicting fault-prone components in a java legacy system

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Towards a Theoretical Model for Software Growth

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Problems with Precision: A Response to "Comments on 'Data Mining Static Code Attributes to Learn Defect Predictors'"

IEEE Transactions on Software Engineering
A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction

Proceedings of the 30th international conference on Software engineering
Future of Mining Software Archives: A Roundtable

IEEE Software
Revisiting the evaluation of defect prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Predicting faults using the complexity of code changes

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
A systematic and comprehensive investigation of methods to build and evaluate fault prediction models

Journal of Systems and Software
Software Dependencies, Work Dependencies, and Their Impact on Failures

IEEE Transactions on Software Engineering
Searching for a Needle in a Haystack: Predicting Security Vulnerabilities for Windows Vista

ICST '10 Proceedings of the 2010 Third International Conference on Software Testing, Verification and Validation
Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project

Proceedings of the 2010 ACM-IEEE International Symposium on Empirical Software Engineering and Measurement
Revisiting common bug prediction findings using effort-aware models

ICSM '10 Proceedings of the 2010 IEEE International Conference on Software Maintenance
Pragmatic prioritization of software quality assurance efforts

Proceedings of the 33rd International Conference on Software Engineering
Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities

IEEE Transactions on Software Engineering

Using the GPGPU for scaling up mining software repositories

Proceedings of the 34th International Conference on Software Engineering
Method-level bug prediction

Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Modular construction of an analysis tool for mining software repositories

Proceedings of the 12th annual international conference companion on Aspect-oriented software development
Transfer defect learning

Proceedings of the 2013 International Conference on Software Engineering
Studying the effect of co-change dispersion on software quality

Proceedings of the 2013 International Conference on Software Engineering
Discovering, reporting, and fixing performance bugs

Proceedings of the 10th Working Conference on Mining Software Repositories
Risky files: an approach to focus quality improvement effort

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The relationship between various software-related phenomena (e.g., code complexity) and post-release software defects has been thoroughly examined. However, to date these predictions have a limited adoption in practice. The most commonly cited reason is that the prediction identifies too much code to review without distinguishing the impact of these defects. Our aim is to address this drawback by focusing on high-impact defects for customers and practitioners. Customers are highly impacted by defects that break pre-existing functionality (breakage defects), whereas practitioners are caught off-guard by defects in files that had relatively few pre-release changes (surprise defects). The large commercial software system that we study already had an established concept of breakages as the highest-impact defects, however, the concept of surprises is novel and not as well established. We find that surprise defects are related to incomplete requirements and that the common assumption that a fix is caused by a previous change does not hold in this project. We then fit prediction models that are effective at identifying files containing breakages and surprises. The number of pre-release defects and file size are good indicators of breakages, whereas the number of co-changed files and the amount of time between the latest pre-release change and the release date are good indicators of surprises. Although our prediction models are effective at identifying files that have breakages and surprises, we learn that the prediction should also identify the nature or type of defects, with each type being specific enough to be easily identified and repaired.