Classifying Remote Sensing Data with Support Vector Machines and Imbalanced Training Data

Authors:
Björn Waske;Jon Atli Benediktsson;Johannes R. Sveinsson
Affiliations:
Faculty of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland 107;Faculty of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland 107;Faculty of Electrical and Computer Engineering, University of Iceland, Reykjavik, Iceland 107
Venue:
MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Year:
2009

Citing 5
Cited 3

Bagging predictors

Machine Learning
Random Forests

Machine Learning
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Multiple classifier systems in remote sensing: from basics to recent developments

MCS'07 Proceedings of the 7th international conference on Multiple classifier systems

Classification of high dimensional and imbalanced hyperspectral imagery data

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Exploring synergetic effects of dimensionality reduction and resampling tools on hyperspectral imagery data classification

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
GSVM: An SVM for handling imbalanced accuracy between classes inbi-classification problems

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The classification of remote sensing data with imbalanced training data is addressed. The classification accuracy of a supervised method is affected by several factors, such as the classifier algorithm, the input data and the available training data. The use of an imbalanced training set, i.e., the number of training samples from one class is much smaller than from other classes, often results in low classification accuracies for the small classes. In the present study support vector machines (SVM) are trained with imbalanced training data. To handle the imbalanced training data, the training data are resampled (i.e., bagging) and a multiple classifier system, with SVM as base classifier, is generated. In addition to the classifier ensemble a single SVM is applied to the data, using the original balanced and the imbalanced training data sets. The results underline that the SVM classification is affected by imbalanced data sets, resulting in dominant lower classification accuracies for classes with fewer training data. Moreover the detailed accuracy assessment demonstrates that the proposed approach significantly improves the class accuracies achieved by a single SVM, which is trained on the whole imbalanced training data set.