Clustering time series from ARMA models with clipped data

Authors:
A. J. Bagnall;G. J. Janacek
Affiliations:
University of East Anglia, Norwich, England;University of East Anglia, Norwich, England
Venue:
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2004

Citing 11
Cited 15

Visualization of navigation patterns on a Web site using model-based clustering

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A new approach to analyzing gene expression time series data

Proceedings of the sixth annual international conference on Computational biology
Bayesian Clustering by Dynamics

Machine Learning - Special issue: Unsupervised learning
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Distance Measures for Effective Clustering of ARIMA Time-Series

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Finding Similar Time Series

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Mixtures of ARMA Models for Model-Based Time Series Clustering

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Model-based Clustering with Soft Balancing

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Odor-driven attractor dynamics in the antennal lobe allow for simple and rapid olfactory pattern classification

Neural Computation

A Bit Level Representation for Time Series Data Mining with Shape Based Similarity

Data Mining and Knowledge Discovery
General Hierarchical Model (GHM) to measure similarity of time series

ACM SIGMOD Record
Experiencing SAX: a novel symbolic representation of time series

Data Mining and Knowledge Discovery
A bayesian mixture model with linear regression mixing proportions

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Stock Price Forecasting by Combining News Mining and Time Series Analysis

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Incremental clustering of gesture patterns based on a self organizing incremental neural network

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A new class of attacks on time series data mining\m{1}

Intelligent Data Analysis
Classification of household devices by electricity usage profiles

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
SciQL: bridging the gap between science and relational DBMS

Proceedings of the 15th Symposium on International Database Engineering & Applications
A multi-hierarchical representation for similarity measurement of time series

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Motion-Alert: automatic anomaly detection in massive moving objects

ISI'06 Proceedings of the 4th IEEE international conference on Intelligence and Security Informatics
A likelihood ratio distance measure for the similarity between the fourier transform of time series

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A novel bit level time series representation with implication of similarity search and clustering

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A symbolic representation method to preserve the characteristic slope of time series

SBIA'12 Proceedings of the 21st Brazilian conference on Advances in Artificial Intelligence
Preserving Privacy in Time Series Data Mining

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. In this paper we focus on clustering data derived from Autoregressive Moving Average (ARMA) models using k-means and k-medoids algorithms with the Euclidean distance between estimated model parameters. We justify our choice of clustering technique and distance metric by reproducing results obtained in related research. Our research aim is to assess the affects of discretising data into binary sequences of above and below the median, a process known as clipping, on the clustering of time series. It is known that the fitted AR parameters of clipped data tend asymptotically to the parameters for unclipped data. We exploit this result to demonstrate that for long series the clustering accuracy when using clipped data from the class of ARMA models is not significantly different to that achieved with unclipped data. Next we show that if the data contains outliers then using clipped data produces significantly better clusterings. We then demonstrate that using clipped series requires much less memory and operations such as distance calculations can be much faster. Finally, we demonstrate these advantages on three real world data sets.