INSIGHT: efficient and effective instance selection for time-series classification

Authors:
Krisztian Buza;Alexandros Nanopoulos;Lars Schmidt-Thieme
Affiliations:
Information Systems and Machine Learning Lab, University of Hildesheim, Germany;Information Systems and Machine Learning Lab, University of Hildesheim, Germany;Information Systems and Machine Learning Lab, University of Hildesheim, Germany
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Year:
2011

Citing 14
Cited 4

Instance-Based Learning Algorithms

Machine Learning
Time series similarity measures and time series indexing (abstract only)

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases

ACM Transactions on Database Systems (TODS)
Introduction to Algorithms

Introduction to Algorithms
On Issues of Instance Selection

Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Fast time series classification using numerosity reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Querying and mining of time series data: experimental comparison of representations and distance measures

Proceedings of the VLDB Endowment
Nearest neighbors in high-dimensional data: the emergence and influence of hubs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Adaptive k-nearest-neighbor classification using a dynamic number of nearest neighbors

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Time-Series Classification Based on Individualised Error Prediction

CSE '10 Proceedings of the 2010 13th IEEE International Conference on Computational Science and Engineering

Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification

HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
A comparative study of sampling methods and algorithms for imbalanced time series classification

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Instance selection for time series classification based on immune binary particle swarm optimization

Knowledge-Based Systems
Class imbalance and the curse of minority hubs

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Time-series classification is a widely examined data mining task with various scientific and industrial applications. Recent research in this domain has shown that the simple nearest-neighbor classifier using Dynamic Time Warping (DTW) as distance measure performs exceptionally well, in most cases outperforming more advanced classification algorithms. Instance selection is a commonly applied approach for improving efficiency of nearest-neighbor classifier with respect to classification time. This approach reduces the size of the training set by selecting the best representative instances and use only them during classification of new instances. In this paper, we introduce a novel instance selection method that exploits the hubness phenomenon in time-series data, which states that some few instances tend to be much more frequently nearest neighbors compared to the remaining instances. Based on hubness, we propose a framework for score-based instance selection, which is combined with a principled approach of selecting instances that optimize the coverage of training data. We discuss the theoretical considerations of casting the instance selection problem as a graph-coverage problem and analyze the resulting complexity. We experimentally compare the proposed method, denoted as INSIGHT, against FastAWARD, a state-of-the-art instance selection method for time series. Our results indicate substantial improvements in terms of classification accuracy and drastic reduction (orders of magnitude) in execution times.