An empirical investigation of filter attribute selection techniques for software quality classification

Authors:
Kehan Gao;Taghi M. Khoshgoftaar;Huanjing Wang
Affiliations:
Eastern Connecticut State University, Willimantic, Connecticut;Florida Atlantic University, Boca Raton, Florida;Western Kentucky University, Bowling Green, Kentucky
Venue:
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Year:
2009

Citing 17
Cited 2

Instance-Based Learning Algorithms

Machine Learning
C4.5: programs for machine learning

C4.5: programs for machine learning
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Rough Sets: Theoretical Aspects of Reasoning about Data

Rough Sets: Theoretical Aspects of Reasoning about Data
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Emerald: Software Metrics and Models on the Desktop

IEEE Software
A New Version of Rough Set Exploration System

TSCTC '02 Proceedings of the Third International Conference on Rough Sets and Current Trends in Computing
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Application of an Attribute Selection Method to CBR-Based Software Quality Classification

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Consistency-based search in feature selection

Artificial Intelligence
Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Empirical Software Engineering
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Assessment of a Multi-Strategy Classifier for an Embedded Software System

ICTAI '06 Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
An Empirical Study of Learning from Imbalanced Data Using Random Forest

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Software measurement data reduction using ensemble techniques

Neurocomputing
An Empirical Investigation of Filter Attribute Selection Techniques for High-Speed Network Traffic Flow Classification

Wireless Personal Communications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Attribute selection is an important activity in data preprocessing for software quality modeling and other data mining problems. The software quality models have been used to improve the fault detection process. Finding faulty components in a software system during early stages of software development process can lead to a more reliable final product and can reduce development and maintenance costs. It has been shown in some studies that prediction accuracy of the models improves when irrelevant and redundant features are removed from the original data set. In this study, we investigated four filter attribute selection techniques, Automatic Hybrid Search (AHS), Rough Sets (RS), Kolmogorov-Smirnov (KS) and Probabilistic Search (PS) and conducted the experiments by using them on a very large telecommunications software system. In order to evaluate their classification performance on the smaller subsets of attributes selected using different approaches, we built several classification models using five different classifiers. The empirical results demonstrated that by applying an attribution selection approach we can build classification models with an accuracy comparable to that built with a complete set of attributes. Moreover, the smaller subset of attributes has less than 15 percent of the complete set of attributes. Therefore, the metrics collection, model calibration, model validation, and model evaluation times of future software development efforts of similar systems can be significantly reduced. In addition, we demonstrated that our recently proposed attribute selection technique, KS, outperformed the other three attribute selection techniques.