Exploring the relationship of a file's history and its fault-proneness: An empirical method and its application to open source programs

Authors:
Timea Illes-Seifert;Barbara Paech
Affiliations:
Institute for Computer Science, University of Heidelberg, Im Neuenheimer Feld 326, D-69120 Heidelberg, Germany;Institute for Computer Science, University of Heidelberg, Im Neuenheimer Feld 326, D-69120 Heidelberg, Germany
Venue:
Information and Software Technology
Year:
2010

Citing 27
Cited 3

A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
Predicting Fault Incidence Using Software Change History

IEEE Transactions on Software Engineering
The distribution of faults in a large industrial software system

ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
Deriving models of software fault-proneness

SEKE '02 Proceedings of the 14th international conference on Software engineering and knowledge engineering
Software Metrics: A Rigorous and Practical Approach

Software Metrics: A Rigorous and Practical Approach
An empirical evaluation of fault-proneness models

Proceedings of the 24th International Conference on Software Engineering
Using Process History to Predict Software Quality

Computer
Hipikat: recommending pertinent software development artifacts

Proceedings of the 25th International Conference on Software Engineering
Populating a Release History Database from Version Control and Bug Tracking Systems

ICSM '03 Proceedings of the International Conference on Software Maintenance
An Empirical Analysis of Fault Persistence Through Software Releases

ISESE '03 Proceedings of the 2003 International Symposium on Empirical Software Engineering
Characterizing the Evolution of Class Hierarchies

CSMR '05 Proceedings of the Ninth European Conference on Software Maintenance and Reengineering
Use of relative code churn measures to predict system defect density

Proceedings of the 27th international conference on Software engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Toward Understanding the Rhetoric of Small Source Code Changes

IEEE Transactions on Software Engineering
When do changes induce fixes?

MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction

IEEE Transactions on Software Engineering
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Looking for bugs in all the right places

Proceedings of the 2006 international symposium on Software testing and analysis
Predicting fault-prone components in a java legacy system

Proceedings of the 2006 ACM/IEEE international symposium on Empirical software engineering
Predicting Faults from Cached History

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Using Developer Information as a Factor for Fault Prediction

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Exploring the relationship of history characteristics and defect count: an empirical study

DEFECTS '08 Proceedings of the 2008 workshop on Defects in large software systems
Comparing methods to identify defect reports in a change management database

DEFECTS '08 Proceedings of the 2008 workshop on Defects in large software systems
Iterative identification of fault-prone binaries using in-process metrics

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Invited Talk: The Role of Empiricism in Improving the Reliability of Future Software

TAIC-PART '08 Proceedings of the Testing: Academic & Industrial Conference - Practice and Research Techniques

Optimizing cost and quality by integrating inspection and test processes

Proceedings of the 2011 International Conference on Software and Systems Process
Reliability analysis and optimal version-updating for open source software

Information and Software Technology
Reducing test effort: A systematic mapping study on existing approaches

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: The knowledge about particular characteristics of software that are indicators for defects is very valuable for testers because it helps them to focus the testing effort and to allocate their limited resources appropriately. Objective: In this paper, we explore the relationship between several historical characteristics of files and their defect count. Method: For this purpose, we propose an empirical approach that uses statistical procedures and visual representations of the data in order to determine indicators for a file's defect count. We apply this approach to nine open source Java projects across different versions. Results: Only 4 of 9 programs show moderate correlations between a file's defects in previous and in current releases in more than half of the analysed releases. In contrast to our expectations, the oldest files represent the most fault-prone files. Additionally, late changes correlate with a file's defect count only partly. The number of changes, the number of distinct authors performing changes to a file as well as the file's age are good indicators for a file's defect count in all projects. Conclusion: Our results show that a software's history is a good indicator for ist quality. We did not find one indicator that persists across all projects in an equal manner. Nevertheless, there are several indicators that show significant strong correlations in nearly all projects: DA (number of distinct authors) and FC (frequency of change). In practice, for each software, statistical analyses have to be performed in order to evaluate the best indicator(s) for a file's defect count.