Unsupervised learning for expert-based software quality estimation

Authors:
Shi Zhong;Taghi M. Khoshgoftaar;Naeem Seliya
Affiliations:
Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL
Venue:
HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering
Year:
2004

Citing 10
Cited 11

Competitive learning algorithms for vector quantization

Neural Networks
Self-organizing maps

Self-organizing maps
A general probabilistic framework for clustering individuals and objects

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Comparing case-based reasoning classifiers for predicting high risk software components

Journal of Systems and Software
Chameleon: Hierarchical Clustering Using Dynamic Modeling

Computer
Assessing the applicability of fault-proneness models across object-oriented software projects

IEEE Transactions on Software Engineering
An Application of Fuzzy Clustering to Software Quality Prediction

ASSET '00 Proceedings of the 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology (ASSET'00)
Tree-Based Software Quality Estimation Models For Fault Prediction

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Experience from Replicating Empirical Studies on Prediction Models

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
A unified framework for model-based clustering

The Journal of Machine Learning Research

Statistical models vs. expert estimation for fault prediction in modified code - an industrial case study

Journal of Systems and Software
Regression via Classification applied on software defect estimation

Expert Systems with Applications: An International Journal
A Fault Prediction Model with Limited Fault Data to Improve Test Process

PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem

Information Sciences: an International Journal
Misclassification cost-sensitive fault prediction models

PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
References

Dependability metrics
Review: Software fault prediction: A literature review and current trends

Expert Systems with Applications: An International Journal
Application of K-Medoids with Kd-Tree for Software Fault Prediction

ACM SIGSOFT Software Engineering Notes
A survey in the area of machine learning and its application for software quality prediction

ACM SIGSOFT Software Engineering Notes
Software mining and fault prediction

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Prediction of faults-slip-through in large software projects: an empirical evaluation

Software Quality Control

Quantified Score

Hi-index	0.02

Visualization

Abstract

Current software quality estimation models often involve using supervised learning methods to train a software quality classifier or a software fault prediction model. In such models, the dependent variable is a software quality measurement indicating the quality of a software module by either a risk-based class membership (e.g., whether it is fault-prone or not fault-prone) or the number of faults. In reality, such a measurement may be inaccurate, or even unavailable. In such situations, this paper advocates the use of unsupervised learning (i.e., clustering) techniques to build a software quality estimation system, with the help of a software engineering human expert. The system first clusters hundreds of software modules into a small number of coherent groups and presents the representative of each group to a software quality expert, who labels each cluster as either fault-prone or not fault-prone based on his domain knowledge as well as some data statistics (without any knowledge of the dependent variable, i.e., the software quality measurement). Our preliminary empirical results show promising potentials of this methodology in both predicting software quality and detecting potential noise in a software measurement and quality dataset.