Expert Systems with Applications: An International Journal
Characterizing the roles of classes and their fault-proneness through change metrics
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
Test strategies in distributed software development environments
Computers in Industry
Hi-index | 0.00 |
Defect density and defect prediction are essential for ef- ficient resource allocation in software evolution. In an em- pirical study we applied data mining techniques for value series based on evolution attributes such as number of au- thors, commit messages, lines of code, bug fix count, etc. Daily data points of these evolution attributes were cap- tured over a period of two months to predict the defects in the subsequent two months in a project. For that, we developed models utilizing genetic programming and lin- ear regression to accurately predict software defects. In our study, we investigated the data of three independent projects, two open source and one commercial software system. The results show that by utilizing series of these attributes we obtain models with high correlation coeffi- cients (between 0.716 and 0.946). Further, we argue that prediction models based on series of a single variable are sometimes superior to the model including all attributes: in contrast to other studies that resulted in size or complexity measures as predictors, we have identified the number of authors and the number of commit messages to versioning systems as excellent predictors of defect densities.