C4.5: programs for machine learning
C4.5: programs for machine learning
Wrappers for feature subset selection
Artificial Intelligence - Special issue on relevance
Predicting Fault Incidence Using Software Change History
IEEE Transactions on Software Engineering
Robust Classification for Imprecise Environments
Machine Learning
Identifying Reasons for Software Changes Using Historic Databases
ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Detection of software modules with high debug code churn in a very large legacy system
ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
Static analysis tools as early indicators of pre-release defect density
Proceedings of the 27th international conference on Software engineering
Predicting the Location and Number of Faults in Large Software Systems
IEEE Transactions on Software Engineering
HATARI: raising risk awareness
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
The Top Ten List: Dynamic Fault Prediction
ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Predicting defect densities in source code files with decision tree learners
Proceedings of the 2006 international workshop on Mining software repositories
Information theoretic evaluation of change prediction models for large-scale software
Proceedings of the 2006 international workshop on Mining software repositories
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Predicting Defects for Eclipse
ICSEW '07 Proceedings of the 29th International Conference on Software Engineering Workshops
Predicting Defects and Changes with Import Relations
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Local and Global Recency Weighting Approach to Bug Prediction
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Can developer-module networks predict failures?
Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Predicting failures with developer networks and social network analysis
Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering
Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops
Semantic web enabled software analysis
Web Semantics: Science, Services and Agents on the World Wide Web
The missing links: bugs and bug-fix commits
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Comparing fine-grained source code changes and code churn for bug prediction
Proceedings of the 8th Working Conference on Mining Software Repositories
Exploring, exposing, and exploiting emails to include human factors in software engineering
Proceedings of the 33rd International Conference on Software Engineering
Using the gini coefficient for bug prediction in eclipse
Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution
Application and evaluation of inductive reasoning methods for the semantic web and software analysis
RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Are popular classes more defect prone?
FASE'10 Proceedings of the 13th international conference on Fundamental Approaches to Software Engineering
Evaluating defect prediction approaches: a benchmark and an extensive comparison
Empirical Software Engineering
Time variance and defect prediction in software projects
Empirical Software Engineering
Controversy Corner: Preserving knowledge in software projects
Journal of Systems and Software
Incorporating qualitative and quantitative factors for software defect prediction
Proceedings of the 2nd international workshop on Evidential assessment of software technologies
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Hi-index | 0.00 |
Predicting the defects in the next release of a large software system is a very valuable asset for the project manger to plan her resources. In this paper we argue that temporal features (or aspects) of the data are central to prediction performance. We also argue that the use of non-linear models, as opposed to traditional regression, is necessary to uncover some of the hidden interrelationships between the features and the defects and maintain the accuracy of the prediction in some cases. Using data obtained from the CVS and Bugzilla repositories of the Eclipse project, we extract a number of temporal features, such as the number of revisions and number of reported issues within the last three months. We then use these data to predict both the location of defects (i.e., the classes in which defects will occur) as well as the number of reported bugs in the next month of the project. To that end we use standard tree-based induction algorithms in comparison with the traditional regression. Our non-linear models uncover the hidden relationships between features and defects, and present them in easy to understand form. Results also show that using the temporal features our prediction model can predict whether a source file will have a defect with an accuracy of 99% (area under ROC curve 0.9251) and the number of defects with a mean absolute error of 0.019 (Spearman's correlation of 0.96).