A New Performance Measure for Class Imbalance Learning. Application to Bioinformatics Problems

Authors:
Rukshan Batuwita;Vasile Palade
Affiliations:
-;-
Venue:
ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
Year:
2009

Citing 0
Cited 3

Sample subset optimization for classifying imbalanced biological data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Weighted Online Sequential Extreme Learning Machine for Class Imbalance Learning

Neural Processing Letters
Evaluation of a new hybrid algorithm for highly imbalanced classification problems

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In class imbalance learning, the performance measure used for the model selection would play a vital role. It has been well-studied in the past research that the most widely used performance measure, the overall accuracy of the model, can lead to sub-optimal classification models when learning from imbalanced datasets. In order to overcome this problem, other performance measures, such as the Geometric-mean (Gm) and F-measure (Fm), have been used for imbalanced dataset learning. Training a classifier system with an imbalanced dataset (where the positive class is the minority class) would usually produce sub-optimal models having a higher Specificity (SP) and a lower Sensitivity (SE). By applying class imbalance learning methods, we would often be able to increase the SE by sacrificing some amount of SP. In some type of real world imbalanced classification problems, such as the gene finding Bioinformatics problems, it is important to improve the SE as much as possible by keeping the reduction of SP to the minimum. In this paper, we show that with respect to this type of classification problems the existing performance measures used in class imbalance learning (Gm and Fm) can still result in sub-optimal classification models. In order to circumvent these problems, we introduced a new performance measure, called Adjusted Geometric-mean (AGm). We show, both analytically and empirically on two real-world Bioinformatics datasets, that AGm can perform better than Gm and Fm metrics.