Software errors and complexity: an empirical investigation0
Communications of the ACM
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Predicting Fault Incidence Using Software Change History
IEEE Transactions on Software Engineering
Does Code Decay? Assessing the Evidence from Change Management Data
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
Software Faults in Evolving a Large, Real-Time System: a Case Study
ESEC '93 Proceedings of the 4th European Software Engineering Conference on Software Engineering
Understanding and predicting effort in software projects
Proceedings of the 25th International Conference on Software Engineering
Detection of Logical Coupling Based on Product Release History
ICSM '98 Proceedings of the International Conference on Software Maintenance
Identifying Reasons for Software Changes Using Historic Databases
ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Predicting the Location and Number of Faults in Large Software Systems
IEEE Transactions on Software Engineering
Mining Version Histories to Guide Software Changes
IEEE Transactions on Software Engineering
The Top Ten List: Dynamic Fault Prediction
ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Mining software repositories to assist developers and support managers
Mining software repositories to assist developers and support managers
Improving defect prediction using temporal features and non linear models
Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
Journal of Software Maintenance and Evolution: Research and Practice
Theory of relative defect proneness
Empirical Software Engineering
Predicting software bugs using ARIMA model
Proceedings of the 48th Annual Southeast Regional Conference
Incorporating qualitative and quantitative factors for software defect prediction
Proceedings of the 2nd international workshop on Evidential assessment of software technologies
Open Source Software Systems: Understanding Bug Prediction and Software Developer Roles
International Journal of Open Source Software and Processes
Hi-index | 0.00 |
In this paper, we analyze the data extracted from several open source software repositories. We observe that the change data follows a Zipf distribution. Based on the extracted data, we then develop three probabilistic models to predict which files will have changes or bugs. The first model is Maximum Likelihood Estimation (MLE), which simply counts the number of events, i.e., changes or bugs, that happen to each file and normalizes the counts to compute a probability distribution. The second model is Reflexive Exponential Decay (RED) in which we postulate that the predictive rate of modification in a file is incremented by any modification to that file and decays exponentially. The third model is called RED-Co-Change. With each modification to a given file, the RED-Co-Change model not only increments its predictive rate, but also increments the rate for other files that are related to the given file through previous co-changes. We then present an information-theoretic approach to evaluate the performance of different prediction models. In this approach, the closeness of model distribution to the actual unknown probability distribution of the system is measured using cross entropy. We evaluate our prediction models empirically using the proposed information-theoretic approach for six large open source systems. Based on this evaluation, we observe that of our three prediction models, the RED-Co-Change model predicts the distribution that is closest to the actual distribution for all the studied systems.