Object-oriented metrics: measures of complexity
Object-oriented metrics: measures of complexity
A Critique of Software Defect Prediction Models
IEEE Transactions on Software Engineering
A Hierarchical Model for Object-Oriented Design Quality Assessment
IEEE Transactions on Software Engineering
A Metrics Suite for Object Oriented Design
IEEE Transactions on Software Engineering
An Empirical Study on Object-Oriented Metrics
METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
Product metrics for object-oriented systems
ACM Computing Surveys (CSUR)
Predicting the Location and Number of Faults in Large Software Systems
IEEE Transactions on Software Engineering
Building Defect Prediction Models in Practice
IEEE Software
Mining metrics to predict component failures
Proceedings of the 28th international conference on Software engineering
Looking for bugs in all the right places
Proceedings of the 2006 international symposium on Software testing and analysis
An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software
DEPCOS-RELCOMEX '07 Proceedings of the 2nd International Conference on Dependability of Computer Systems
IEEE Transactions on Software Engineering
Adapting a fault prediction model to allow inter languagereuse
Proceedings of the 4th international workshop on Predictor models in software engineering
Empirical Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction
Empirical Software Engineering
Test-Driven Development: An Empirical Evaluation of Agile Practice
Test-Driven Development: An Empirical Evaluation of Agile Practice
A framework for defect prediction in specific software project contexts
CEE-SET'08 Proceedings of the Third IFIP TC 2 Central and East European conference on Software engineering techniques
An investigation on the feasibility of cross-project defect prediction
Automated Software Engineering
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Empirical evaluation of the effects of mixed project data on learning defect predictors
Information and Software Technology
Better cross company defect prediction
Proceedings of the 10th Working Conference on Mining Software Repositories
Training data selection for cross-project defect prediction
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Hi-index | 0.00 |
Background: This paper describes an analysis that was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters. Aims: The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. One defect prediction model should work well for all projects that belong to such group. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency. Method: Hierarchical and k-means clustering, as well as Kohonen's neural network was used to find groups of similar projects. The obtained clusters were investigated with the discriminant analysis. For each of the identified group a statistical analysis has been conducted in order to distinguish whether this group really exists. Two defect prediction models were created for each of the identified groups. The first one was based on the projects that belong to a given group, and the second one - on all the projects. Then, both models were applied to all versions of projects from the investigated group. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model (the mean values were compared and statistical tests were used), we conclude that the group really exists. Results: Six different clusters were identified and the existence of two of them was statistically proven: 1) cluster proprietary B -- T=19, p=0.035, r=0.40; 2) cluster proprietary/open - t(17)=3.18, p=0.05, r=0.59. The obtained effect sizes (r) represent large effects according to Cohen's benchmark, which is a substantial finding. Conclusions: The two identified clusters were described and compared with results obtained by other researchers. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied.