High-impact defects: a study of breakage and surprise defects

  • Authors:
  • Emad Shihab;Audris Mockus;Yasutaka Kamei;Bram Adams;Ahmed E. Hassan

  • Affiliations:
  • Queen's Univeristy, Kingston, ON, Canada;Avaya Labs Research, Basking Ridge, NJ, USA;Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, Canada;Queen's University, Kingston, ON, ON, Canada

  • Venue:
  • Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The relationship between various software-related phenomena (e.g., code complexity) and post-release software defects has been thoroughly examined. However, to date these predictions have a limited adoption in practice. The most commonly cited reason is that the prediction identifies too much code to review without distinguishing the impact of these defects. Our aim is to address this drawback by focusing on high-impact defects for customers and practitioners. Customers are highly impacted by defects that break pre-existing functionality (breakage defects), whereas practitioners are caught off-guard by defects in files that had relatively few pre-release changes (surprise defects). The large commercial software system that we study already had an established concept of breakages as the highest-impact defects, however, the concept of surprises is novel and not as well established. We find that surprise defects are related to incomplete requirements and that the common assumption that a fix is caused by a previous change does not hold in this project. We then fit prediction models that are effective at identifying files containing breakages and surprises. The number of pre-release defects and file size are good indicators of breakages, whereas the number of co-changed files and the amount of time between the latest pre-release change and the release date are good indicators of surprises. Although our prediction models are effective at identifying files that have breakages and surprises, we learn that the prediction should also identify the nature or type of defects, with each type being specific enough to be easily identified and repaired.