Feature selection and clustering in software quality prediction

  • Authors:
  • Qi Wang;Jie Zhu;Bo Yu

  • Affiliations:
  • Dept. of Electronic Engineering, Shanghai Jiaotong University, Shanghai, P. R.China;Dept. of Electronic Engineering, Shanghai Jiaotong University, Shanghai, P. R.China;System Verification Test Dept. of Lucent Technologies Optical Networks Co., Ltd, Shanghai, P. R. China

  • Venue:
  • EASE'07 Proceedings of the 11th international conference on Evaluation and Assessment in Software Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software quality prediction models use the software metrics and fault data collected from previous software releases or similar projects to predict the quality of software components in development. Previous research has shown that this kind of models can yield predictions with impressive accuracy. However, building accurate software quality prediction model is still challenging for following two reasons. Firstly, the outliers in software data often have a disproportionate effect on the overalls predictive ability of the model. Secondly, not all collected software metrics should be used to construct model because of the curse of dimension. To resolve these two problems, we present a new software quality prediction model based on genetic algorithm (GA) in which outlier detection and feature selection are executed simultaneously. The experimental results illustrate this model performs better than some latest raised software quality prediction models based on S-PLUS and TreeDisc. Furthermore, the clustered software components and selected features are easier for software engineers and data analysts to study and interpret.