Improving SVM classification on imbalanced time series data sets with ghost points

Authors:
Suzan Köknar-Tezel;Longin Jan Latecki
Affiliations:
Temple University, Department of Computer and Information Sciences, Philadelphia, PA, USA;Temple University, Department of Computer and Information Sciences, Philadelphia, PA, USA
Venue:
Knowledge and Information Systems
Year:
2011

Citing 0
Cited 3

Research of neural network classifier based on FCM and PSO for breast cancer classification

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part I
Densifying Distance Spaces for Shape and Image Retrieval

Journal of Mathematical Imaging and Vision
Local discriminative distance metrics ensemble learning

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Imbalanced data sets present a particular challenge to the data mining community. Often, it is the rare event that is of interest and the cost of misclassifying the rare event is higher than misclassifying the usual event. When the data is highly skewed toward the usual, it can be very difficult for a learning system to accurately detect the rare event. There have been many approaches in recent years for handling imbalanced data sets, from under-sampling the majority class to adding synthetic points to the minority class in feature space. However, distances between time series are known to be non-Euclidean and non-metric, since comparing time series requires warping in time. This fact makes it impossible to apply standard methods like SMOTE to insert synthetic data points in feature spaces. We present an innovative approach that augments the minority class by adding synthetic points in distance spaces. We then use Support Vector Machines for classification. Our experimental results on standard time series show that our synthetic points significantly improve the classification rate of the rare events, and in most cases also improves the overall accuracy of SVMs. We also show how adding our synthetic points can aid in the visualization of time series data sets.