DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Authors:
Mina Alibeigi;Sattar Hashemi;Ali Hamzeh
Affiliations:
-;-;-
Venue:
Data & Knowledge Engineering
Year:
2012

Citing 56
Cited 1

The Strength of Weak Learnability

Machine Learning
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bagging predictors

Machine Learning
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Supervised versus unsupervised binary-learning by feedforward neural networks

Machine Learning
Magical thinking in data mining: lessons from CoIL challenge 2000

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Machine Learning

Machine Learning
Adaptive Fraud Detection

Data Mining and Knowledge Discovery
Gene Selection for Cancer Classification using Support Vector Machines

Machine Learning
Learning When Negative Examples Abound

ECML '97 Proceedings of the 9th European Conference on Machine Learning
Feature Selection for Unbalanced Class Distribution and Naive Bayes

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Comparative Study of Cost-Sensitive Boosting Algorithms

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A Case Study for Learning from Imbalanced Data Sets

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Improving Identification of Difficult Small Classes by Balancing Class Distribution

AIME '01 Proceedings of the 8th Conference on AI in Medicine in Europe: Artificial Intelligence Medicine
One-class svms for document classification

The Journal of Machine Learning Research
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Editorial: special issue on learning from imbalanced data sets

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Extreme re-balancing for SVMs: a case study

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Feature selection for text categorization on imbalanced data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000

Machine Learning
Efficient Feature Selection via Analysis of Relevance and Redundancy

The Journal of Machine Learning Research
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
One-class document classification via Neural Networks

Neurocomputing
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Boosted Classification Trees and Class Probability/Quantile Estimation

The Journal of Machine Learning Research
Cost-sensitive boosting for classification of imbalanced data

Pattern Recognition
Feature Extraction and Uncorrelated Discriminant Analysis for High-Dimensional Data

IEEE Transactions on Knowledge and Data Engineering
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Selective Pre-processing of Imbalanced Data for Improving Classification Performance

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Feature selection with dynamic mutual information

Pattern Recognition
Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
A General Framework of Feature Selection for Text Categorization

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Knowledge discovery from imbalanced and noisy data

Data & Knowledge Engineering
MSMOTE: Improving Classification Performance When Training Data is Imbalanced

IWCSE '09 Proceedings of the 2009 Second International Workshop on Computer Science and Engineering - Volume 02
An empirical study of the behavior of classifiers on imbalanced and overlapped data sets

CIARP'07 Proceedings of the Congress on pattern recognition 12th Iberoamerican conference on Progress in pattern recognition, image analysis and applications
An asymmetric classifier based on partial least squares

Pattern Recognition
Combating the Small Sample Class Imbalance Problem Using Feature Selection

IEEE Transactions on Knowledge and Data Engineering
Boosting support vector machines for imbalanced data sets

Knowledge and Information Systems
Integrating selective pre-processing of imbalanced data with Ivotes ensemble

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Learning from imbalanced data in presence of noisy and borderline examples

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Learning classifiers from imbalanced data based on biased minimax probability machine

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
The novelty detection approach for different degrees of class imbalance

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Balancing strategies and class overlapping

IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
RUSBoost: A Hybrid Approach to Alleviating Class Imbalance

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

GAB-EPA: a GA based ensemble pruning approach to tackle multiclass imbalanced problems

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays, imbalanced data sets are pervasive in real world human practices, and hence, become a very interesting research area within machine learning communities. Imbalanced data sets introduce a significant reduction in performance of standard classifiers when they are invoked to learn data underlying concepts. The problem becomes even more sever when imbalanced data sets are involved with high dimensions. This paper presents a novel feature ranking approach based on the probability density estimation to cope with these issues. The idea behind our approach, named Density Based Feature Selection (DBFS), is that features' distributions over classes can bring significant benefits to feature selection algorithms. In other words, to explore the contribution of each attribute and assign it an appropriate rank, DBFS takes into account features' corresponding distributions over all classes along with their correlations. To show the effectiveness of the presented approach, well-known feature ranking methods are implemented and compared with our approach across varieties of small sample size and high dimensional data sets from microarray, mass spectrometry and text mining domains. Our theoretical analysis and experimental observations reveal that our approach is the method of choice by offering a simple yet effective feature ranking method based on well-known statistical evaluation measures.