Feature selection and clustering in software quality prediction

Authors:
Qi Wang;Jie Zhu;Bo Yu
Affiliations:
Dept. of Electronic Engineering, Shanghai Jiaotong University, Shanghai, P. R.China;Dept. of Electronic Engineering, Shanghai Jiaotong University, Shanghai, P. R.China;System Verification Test Dept. of Lucent Technologies Optical Networks Co., Ltd, Shanghai, P. R. China
Venue:
EASE'07 Proceedings of the 11th international conference on Evaluation and Assessment in Software Engineering
Year:
2007

Citing 19
Cited 1

A note on genetic algorithms for large-scale feature selection

Pattern Recognition Letters
Case-based reasoning

Case-based reasoning
A neural network approach for early detection of program modules having high risk in the maintenance phase

Selected papers of the sixth annual Oregon workshop on Software metrics
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A Review and Empirical Evaluation of Feature Weighting Methods for aClass of Lazy Learning Algorithms

Artificial Intelligence Review - Special issue on lazy learning
A comparison of software effort estimation techniques: using function points with neural networks, case-based reasoning and regression models

Journal of Systems and Software
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Two-phase clustering process for outliers detection

Pattern Recognition Letters
Accuracy of software quality models over multiple releases

Annals of Software Engineering
Controlling Overfitting in Classification-Tree Models ofSoftware Quality

Empirical Software Engineering
Squeezer: an efficient algorithm for clustering categorical data

Journal of Computer Science and Technology
Further Research on Feature Selection and Classification Using Genetic Algorithms

Proceedings of the 5th International Conference on Genetic Algorithms
Algorithms for Mining Distance-Based Outliers in Large Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Discovering cluster-based local outliers

Pattern Recognition Letters
Controlling Overfitting in Software Quality Models: Experiments with Regression Trees and Classification

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Software Quality Classification Modeling Using The SPRINT Decision Tree Algorithm

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Detection of software modules with high debug code churn in a very large legacy system

ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
Fuzzy logic techniques for software reliability engineering

Fuzzy logic techniques for software reliability engineering

Review: Software fault prediction: A literature review and current trends

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software quality prediction models use the software metrics and fault data collected from previous software releases or similar projects to predict the quality of software components in development. Previous research has shown that this kind of models can yield predictions with impressive accuracy. However, building accurate software quality prediction model is still challenging for following two reasons. Firstly, the outliers in software data often have a disproportionate effect on the overalls predictive ability of the model. Secondly, not all collected software metrics should be used to construct model because of the curse of dimension. To resolve these two problems, we present a new software quality prediction model based on genetic algorithm (GA) in which outlier detection and feature selection are executed simultaneously. The experimental results illustrate this model performs better than some latest raised software quality prediction models based on S-PLUS and TreeDisc. Furthermore, the clustered software components and selected features are easier for software engineers and data analysts to study and interpret.