On Clustering Validation Techniques
Journal of Intelligent Information Systems
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Distance Measures for Effective Clustering of ARIMA Time-Series
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mixtures of ARMA Models for Model-Based Time Series Clustering
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
Data Mining and Knowledge Discovery
A symbolic representation of time series, with implications for streaming algorithms
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Model-based Clustering with Soft Balancing
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Incremental, Online, and Merge Mining of Partial Periodic Patterns in Time-Series Databases
IEEE Transactions on Knowledge and Data Engineering
Discovering clusters in motion time-series data
CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
A Bit Level Representation for Time Series Data Mining with Shape Based Similarity
Data Mining and Knowledge Discovery
Inaccuracies of Shape Averaging Method Using Dynamic Time Warping for Time Series Data
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Clustering Streaming Time Series Using CBC
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
A Novel Fractal Representation for Dimensionality Reduction of Large Time Series Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Detection of unique temporal segments by information theoretic meta-clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
WMCA: a weighted matrix coverage based approach to cluster multivariate time series
ICNC'09 Proceedings of the 5th international conference on Natural computation
A review on time series data mining
Engineering Applications of Artificial Intelligence
A novel clustering method on time series data
Expert Systems with Applications: An International Journal
Time series case based reasoning for image categorisation
ICCBR'11 Proceedings of the 19th international conference on Case-Based Reasoning Research and Development
ACM Computing Surveys (CSUR)
Stock market co-movement assessment using a three-phase clustering method
Expert Systems with Applications: An International Journal
Unsupervised learning algorithm for time series using bivariate AR(1) model
Expert Systems with Applications: An International Journal
Hi-index | 0.00 |
Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. Time series data are often large and may contain outliers. We show that the simple procedure of clipping the time series (discretising to above or below the median) reduces memory requirements and significantly speeds up clustering without decreasing clustering accuracy. We also demonstrate that clipping increases clustering accuracy when there are outliers in the data, thus serving as a means of outlier detection and a method of identifying model misspecification. We consider simulated data from polynomial, autoregressive moving average and hidden Markov models and show that the estimated parameters of the clipped data used in clustering tend, asymptotically, to those of the unclipped data. We also demonstrate experimentally that, if the series are long enough, the accuracy on clipped data is not significantly less than the accuracy on unclipped data, and if the series contain outliers then clipping results in significantly better clusterings. We then illustrate how using clipped series can be of practical benefit in detecting model misspecification and outliers on two real world data sets: an electricity generation bid data set and an ECG data set.