Software errors and complexity: an empirical investigation0
Communications of the ACM
A Critique of Software Defect Prediction Models
IEEE Transactions on Software Engineering
The distribution of faults in a large industrial software system
ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
Quantitative Analysis of Faults and Failures in a Complex Software System
IEEE Transactions on Software Engineering
Module Size Distribution and Defect Density
ISSRE '00 Proceedings of the 11th International Symposium on Software Reliability Engineering
ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Predicting the Location and Number of Faults in Large Software Systems
IEEE Transactions on Software Engineering
An investigation of the effect of module size on defect prediction using static measures
PROMISE '05 Proceedings of the 2005 workshop on Predictor models in software engineering
Building Defect Prediction Models in Practice
IEEE Software
Looking for bugs in all the right places
Proceedings of the 2006 international symposium on Software testing and analysis
Data Mining Static Code Attributes to Learn Defect Predictors
IEEE Transactions on Software Engineering
Automating algorithms for the identification of fault-prone files
Proceedings of the 2007 international symposium on Software testing and analysis
A Multivariate Analysis of Static Code Attributes for Defect Prediction
QSIC '07 Proceedings of the Seventh International Conference on Quality Software
On the Distribution of Software Faults
IEEE Transactions on Software Engineering
Optimizing preventive service of software products
IBM Journal of Research and Development
Validation of network measures as indicators of defective modules in software systems
PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Data mining application on crash simulation data of occupant restraint system
Expert Systems with Applications: An International Journal
Review: Software fault prediction: A literature review and current trends
Expert Systems with Applications: An International Journal
MACs: Mining API code snippets for code reuse
Expert Systems with Applications: An International Journal
Evaluating three approaches to extracting fault data from software change repositories
PROFES'10 Proceedings of the 11th international conference on Product-Focused Software Process Improvement
Journal of Systems and Software
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
International Journal of Interdisciplinary Telecommunications and Networking
Comparison and evaluation of source code mining tools and techniques: A qualitative approach
Intelligent Data Analysis
DConfusion: a technique to allow cross study performance evaluation of fault prediction studies
Automated Software Engineering
Hi-index | 12.06 |
In a large software system knowing which files are most likely to be fault-prone is valuable information for project managers. They can use such information in prioritizing software testing and allocating resources accordingly. However, our experience shows that it is difficult to collect and analyze fine-grained test defects in a large and complex software system. On the other hand, previous research has shown that companies can safely use cross-company data with nearest neighbor sampling to predict their defects in case they are unable to collect local data. In this study we analyzed 25 projects of a large telecommunication system. To predict defect proneness of modules we trained models on publicly available Nasa MDP data. In our experiments we used static call graph based ranking (CGBR) as well as nearest neighbor sampling for constructing method level defect predictors. Our results suggest that, for the analyzed projects, at least 70% of the defects can be detected by inspecting only (i) 6% of the code using a Naive Bayes model, (ii) 3% of the code using CGBR framework.