Learning classifiers from imbalanced data based on biased minimax probability machine

Authors:
Kaizhu Huang;Haiqin Yang;Irwin King;Michael R. Lyu
Affiliations:
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong;Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong
Venue:
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Year:
2004

Citing 5
Cited 18

Machine Learning for the Detection of Oil Spills in Satellite Radar Images

Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Increasing sensitivity of preterm birth by changing rule strengths

Pattern Recognition Letters - Special issue: Rough sets, pattern recognition and data mining
Improved Rooftop Detection in Aerial Images with Machine Learning

Machine Learning
A robust minimax approach to classification

The Journal of Machine Learning Research
The use of the area under the ROC curve in the evaluation of machine learning algorithms

Pattern Recognition

Linear Asymmetric Classifier for cascade detectors

ICML '05 Proceedings of the 22nd international conference on Machine learning
A comparative study of Minimax Probability Machine-based approaches for face recognition

Pattern Recognition Letters
An Evaluation of the Robustness of MTS for Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering
2008 Special Issue: Robust BMPM training based on second-order cone programming and its application in medical diagnosis

Neural Networks
An information granulation based data mining approach for classifying imbalanced data

Information Sciences: an International Journal
FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Using granular computing model to induce scheduling knowledge in dynamic manufacturing environments

International Journal of Computer Integrated Manufacturing
Local reweight wrapper for the problem of imbalance

International Journal of Artificial Intelligence and Soft Computing
Localized support vector regression for time series prediction

Neurocomputing
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
An unsupervised self-organizing learning with support vector ranking for imbalanced datasets

Expert Systems with Applications: An International Journal
Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning

ICIC'05 Proceedings of the 2005 international conference on Advances in Intelligent Computing - Volume Part I
Research article: Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions

Computational Biology and Chemistry
Improved response modeling based on clustering, under-sampling, and ensemble

Expert Systems with Applications: An International Journal
Biased minimax probability machine active learning for relevance feedback in content-based image retrieval

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Preprocessing unbalanced data using support vector machine

Decision Support Systems
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
Rare category exploration

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of the binary classification on imbalanced data, in which nearly all the instances are labelled as one class, while far fewer instances are labelled as the other class, usually the more important class. Traditional machine learning methods seeking an accurate performance over a full range of instances are not suitable to deal with this problem, since they tend to classify all the data into the majority, usually the less important class. Moreover, some current methods have tried to utilize some intermediate factors, e.g., the distribution of the training set, the decision thresholds or the cost matrices, to influence the bias of the classification. However, it remains uncertain whether these methods can improve the performance in a systematic way. In this paper, we propose a novel model named Biased Minimax Probability Machine. Different from previous methods, this model directly controls the worst-case real accuracy of classification of the future data to build up biased classifiers. Hence, it provides a rigorous treatment on imbalanced data. The experimental results on the novel model comparing with those of three competitive methods, i.e., the Naive Bayesian classifier, the k-Nearest Neighbor method, and the decision tree method C4.5, demonstrate the superiority of our novel model.