Feature selection with biased sample distributions

Authors:
Abu H. M. Kamal;Xingquan Zhu;Abhijit Pandya;Sam Hsu
Affiliations:
Dept. of Computer Science & Engineering, Florida Atlantic University, Boca Raton, FL;Dept. of Computer Science & Engineering, Florida Atlantic University, Boca Raton, FL and Faculty of Engineering & Information Tech., University of Technology, Sydney, NSW, Australia;Dept. of Computer Science & Engineering, Florida Atlantic University, Boca Raton, FL;Dept. of Computer Science & Engineering, Florida Atlantic University, Boca Raton, FL
Venue:
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Year:
2009

Citing 12
Cited 1

Instance-Based Learning Algorithms

Machine Learning
Wrappers for feature subset selection

Artificial Intelligence - Special issue on relevance
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
Random Forests

Machine Learning
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
An introduction to variable and feature selection

The Journal of Machine Learning Research
A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression

Bioinformatics
Computational Genome Analysis: An Introduction

Computational Genome Analysis: An Introduction
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Input feature selection for classification problems

IEEE Transactions on Neural Networks

Feature selection for optimizing traffic classification

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Feature selection concerns the problem of selecting a number of important features (w.r.t. the class labels) in order to build accurate prediction models. Traditional feature selection methods, however, fail to take the sample distributions into the consideration which may lead to poor predictions for minority class examples. Due to the sophistication and the cost involved in the data collection process, many applications, such as Biomedical research, commonly face biased data collections with one class of examples (e.g., diseased samples) significantly less than other classes (e.g., normal samples). For these applications, the minority class examples, such as disease samples, credit card frauds, and network intrusions, are only a small portion of the data collections but deserve full attentions for accurate prediction. In this paper, we propose three filtering techniques, Higher Weight (HW), Differential Minority Repeat (DMR) and Balanced Minority Repeat (BMR), to identify important features from biased data collections. Experimental comparisons with the ReliefF method on five datasets demonstrate the effectiveness of the proposed methods in selecting informative features from data with biased sample distributions.