Unsupervised learning for expert-based software quality estimation

  • Authors:
  • Shi Zhong;Taghi M. Khoshgoftaar;Naeem Seliya

  • Affiliations:
  • Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL

  • Venue:
  • HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.02

Visualization

Abstract

Current software quality estimation models often involve using supervised learning methods to train a software quality classifier or a software fault prediction model. In such models, the dependent variable is a software quality measurement indicating the quality of a software module by either a risk-based class membership (e.g., whether it is fault-prone or not fault-prone) or the number of faults. In reality, such a measurement may be inaccurate, or even unavailable. In such situations, this paper advocates the use of unsupervised learning (i.e., clustering) techniques to build a software quality estimation system, with the help of a software engineering human expert. The system first clusters hundreds of software modules into a small number of coherent groups and presents the representative of each group to a software quality expert, who labels each cluster as either fault-prone or not fault-prone based on his domain knowledge as well as some data statistics (without any knowledge of the dependent variable, i.e., the software quality measurement). Our preliminary empirical results show promising potentials of this methodology in both predicting software quality and detecting potential noise in a software measurement and quality dataset.