Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering

  • Authors:
  • Naeem Seliya;Taghi M. Khoshgoftaar

  • Affiliations:
  • Dept. of Comput. & Inf. Sci, Michigan Univ., Dearborn, MI;-

  • Venue:
  • IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software quality assurance is a vital component of software project development. A software quality estimation model is trained using software measurement and defect (software quality) data of a previously developed release or similar project. Such an approach assumes that the development organization has experience with systems similar to the current project and that defect data are available for all modules in the training data. In software engineering practice, however, various practical issues limit the availability of defect data for modules in the training data. In addition, the organization may not have experience developing a similar system. In such cases, the task of software quality estimation or labeling modules as fault prone or not fault prone falls on the expert. We propose a semisupervised clustering scheme for software quality analysis of program modules with no defect data or quality-based class labels. It is a constraint-based semisupervised clustering scheme that uses k-means as the underlying clustering algorithm. Software measurement data sets obtained from multiple National Aeronautics and Space Administration software projects are used in our empirical investigation. The proposed technique is shown to aid the expert in making better estimations as compared to predictions made when the expert labels the clusters formed by an unsupervised learning algorithm. In addition, the software quality knowledge learnt during the semisupervised process provided good generalization performance for multiple test data sets. An analysis of program modules that remain unlabeled subsequent to our semisupervised clustering scheme provided useful insight into the characteristics of their software attributes