Software measurement data reduction using ensemble techniques

Authors:
Huanjing Wang;Taghi M. Khoshgoftaar;Amri Napolitano
Affiliations:
Department of Mathematics and Computer Science, Western Kentucky University, Bowling Green, KY 42101, United States;Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States;Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, United States
Venue:
Neurocomputing
Year:
2012

Citing 22
Cited 1

Instance-Based Learning Algorithms

Machine Learning
A practical approach to feature selection

ML92 Proceedings of the ninth international workshop on Machine learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Fault Prediction Modeling for Software Quality Estimation: Comparing Commonly Used Techniques

Empirical Software Engineering
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Toward Integrating Feature Selection Algorithms for Classification and Clustering

IEEE Transactions on Knowledge and Data Engineering
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Finding the Right Data for Software Cost Modeling

IEEE Software
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Combining feature selectors for text classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Predicting Defects for Eclipse

ICSEW '07 Proceedings of the 29th International Conference on Software Engineering Workshops
An Empirical Study of Learning from Imbalanced Data Using Random Forest

ICTAI '07 Proceedings of the 19th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
A Novel GA-Taguchi-Based Feature Selection Method

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules

IEEE Transactions on Software Engineering
Consensus group stable feature selection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
An empirical investigation of filter attribute selection techniques for software quality classification

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Variance analysis in software fault prediction models

ISSRE'09 Proceedings of the 20th IEEE international conference on software reliability engineering
Choosing software metrics for defect prediction: an investigation on feature selection techniques

Software—Practice & Experience
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
STochFS: a framework for combining feature selection outcomes through a stochastic process

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Comparison of a genetic algorithm and simulated annealing for automatic neural network ensemble development

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Software defect prediction models are used to identify program modules that are high-risk, or likely to have a high number of faults. These models are built using software metrics which are collected during the software development process. Various techniques and approaches have been created for improving fault predictions. One of these is feature (metric) selection. Choosing the most important features is important to improve the effectiveness of defect predictors. However, using a single feature subset selection method may generate local optima. Ensembles of feature selection methods attempt to combine multiple feature selection methods instead of using a single one. In this paper, we present a comprehensive empirical study examining 17 different ensembles of feature ranking techniques (rankers) including six commonly used feature ranking techniques, the signal-to-noise filter technique, and 11 threshold-based feature ranking techniques. This study utilized 16 real-world software measurement data sets of different sizes and built 54,400 classification models using four well known classifiers. The main conclusion is that ensembles of very few rankers are very effective and even better than ensembles of many or all rankers.