Clustering Time Series with Clipped Data

Authors:
Anthony Bagnall;Gareth Janacek
Affiliations:
School of Computing Sciences, University of East Anglia, Norwich, UK;School of Computing Sciences, University of East Anglia, Norwich, UK
Venue:
Machine Learning
Year:
2005

Citing 9
Cited 12

On Clustering Validation Techniques

Journal of Intelligent Information Systems
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Distance Measures for Effective Clustering of ARIMA Time-Series

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mixtures of ARMA Models for Model-Based Time Series Clustering

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Model-based Clustering with Soft Balancing

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Incremental, Online, and Merge Mining of Partial Periodic Patterns in Time-Series Databases

IEEE Transactions on Knowledge and Data Engineering
Discovering clusters in motion time-series data

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

A Bit Level Representation for Time Series Data Mining with Shape Based Similarity

Data Mining and Knowledge Discovery
Inaccuracies of Shape Averaging Method Using Dynamic Time Warping for Time Series Data

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Clustering Streaming Time Series Using CBC

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
A Novel Fractal Representation for Dimensionality Reduction of Large Time Series Data

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Detection of unique temporal segments by information theoretic meta-clustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
WMCA: a weighted matrix coverage based approach to cluster multivariate time series

ICNC'09 Proceedings of the 5th international conference on Natural computation
A review on time series data mining

Engineering Applications of Artificial Intelligence
A novel clustering method on time series data

Expert Systems with Applications: An International Journal
Time series case based reasoning for image categorisation

ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
Time-series data mining

ACM Computing Surveys (CSUR)
Stock market co-movement assessment using a three-phase clustering method

Expert Systems with Applications: An International Journal
Unsupervised learning algorithm for time series using bivariate AR(1) model

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. Time series data are often large and may contain outliers. We show that the simple procedure of clipping the time series (discretising to above or below the median) reduces memory requirements and significantly speeds up clustering without decreasing clustering accuracy. We also demonstrate that clipping increases clustering accuracy when there are outliers in the data, thus serving as a means of outlier detection and a method of identifying model misspecification. We consider simulated data from polynomial, autoregressive moving average and hidden Markov models and show that the estimated parameters of the clipped data used in clustering tend, asymptotically, to those of the unclipped data. We also demonstrate experimentally that, if the series are long enough, the accuracy on clipped data is not significantly less than the accuracy on unclipped data, and if the series contain outliers then clipping results in significantly better clusterings. We then illustrate how using clipped series can be of practical benefit in detecting model misspecification and outliers on two real world data sets: an electricity generation bid data set and an ECG data set.