Towards identifying software project clusters with regard to defect prediction

Authors:
Marian Jureczko;Lech Madeyski
Affiliations:
Wrocław University of Technology, Wrocław - Poland;Wrocław University of Technology, Wrocław - Poland
Venue:
Proceedings of the 6th International Conference on Predictive Models in Software Engineering
Year:
2010

Citing 19
Cited 5

Object-oriented metrics: measures of complexity

Object-oriented metrics: measures of complexity
A Critique of Software Defect Prediction Models

IEEE Transactions on Software Engineering
A Hierarchical Model for Object-Oriented Design Quality Assessment

IEEE Transactions on Software Engineering
A Metrics Suite for Object Oriented Design

IEEE Transactions on Software Engineering
An Empirical Study on Object-Oriented Metrics

METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
Product metrics for object-oriented systems

ACM Computing Surveys (CSUR)
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
Building Defect Prediction Models in Practice

IEEE Software
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Looking for bugs in all the right places

Proceedings of the 2006 international symposium on Software testing and analysis
An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software

DEPCOS-RELCOMEX '07 Proceedings of the 2nd International Conference on Dependability of Computer Systems
Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes

IEEE Transactions on Software Engineering
A Complexity Measure

IEEE Transactions on Software Engineering
Adapting a fault prediction model to allow inter languagereuse

Proceedings of the 4th international workshop on Predictor models in software engineering
Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models

Empirical Software Engineering
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Test-Driven Development: An Empirical Evaluation of Agile Practice

Test-Driven Development: An Empirical Evaluation of Agile Practice
A framework for defect prediction in specific software project contexts

CEE-SET'08 Proceedings of the Third IFIP TC 2 Central and East European conference on Software engineering techniques

An investigation on the feasibility of cross-project defect prediction

Automated Software Engineering
A further analysis on the use of Genetic Algorithm to configure Support Vector Machines for inter-release fault prediction

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Empirical evaluation of the effects of mixed project data on learning defect predictors

Information and Software Technology
Better cross company defect prediction

Proceedings of the 10th Working Conference on Mining Software Repositories
Training data selection for cross-project defect prediction

Proceedings of the 9th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background: This paper describes an analysis that was conducted on newly collected repository with 92 versions of 38 proprietary, open-source and academic projects. A preliminary study perfomed before showed the need for a further in-depth analysis in order to identify project clusters. Aims: The goal of this research is to perform clustering on software projects in order to identify groups of software projects with similar characteristic from the defect prediction point of view. One defect prediction model should work well for all projects that belong to such group. The existence of those groups was investigated with statistical tests and by comparing the mean value of prediction efficiency. Method: Hierarchical and k-means clustering, as well as Kohonen's neural network was used to find groups of similar projects. The obtained clusters were investigated with the discriminant analysis. For each of the identified group a statistical analysis has been conducted in order to distinguish whether this group really exists. Two defect prediction models were created for each of the identified groups. The first one was based on the projects that belong to a given group, and the second one - on all the projects. Then, both models were applied to all versions of projects from the investigated group. If the predictions from the model based on projects that belong to the identified group are significantly better than the all-projects model (the mean values were compared and statistical tests were used), we conclude that the group really exists. Results: Six different clusters were identified and the existence of two of them was statistically proven: 1) cluster proprietary B -- T=19, p=0.035, r=0.40; 2) cluster proprietary/open - t(17)=3.18, p=0.05, r=0.59. The obtained effect sizes (r) represent large effects according to Cohen's benchmark, which is a substantial finding. Conclusions: The two identified clusters were described and compared with results obtained by other researchers. The results of this work makes next step towards defining formal methods of reuse defect prediction models by identifying groups of projects within which the same defect prediction model may be used. Furthermore, a method of clustering was suggested and applied.