Distance Measures for Effective Clustering of ARIMA Time-Series

  • Authors:
  • Konstantinos Kalpakis;Dhiral Gada;Vasundhara Puttagunta

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many environmental and socioeconomic time-series data can be adequately modeled using Auto-RegressiveIntegrated Moving Average (ARIMA) models. We call such Time-series ARIMA time-series. We consider the problem of clustering ARIMA time-series. We propose the use of the Linear Predictive Coding (LPC) cepstrum of time-series for clustering ARIMA time-series, by using the Euclideandistance between the LPC cepstra of two time-series as their dissimilarity measure. We demonstrate that LPC cepstral coefficients have the desire features for accurate clustering and efficient indexing of ARIMA time-series. For example, few LPC cepstral coefficients are sufficient in order todiscriminate between time-series that are modeled by different ARIMA models. In fact this approach requires fewer coefficients than traditional approaches, such as DFT and DWT. The proposed distance measure can be use for measuring the similarity between different ARIMA models as well.We cluster ARIMA time-series using the Partition Around Medoids method with various similarity measures. We present experimental results demonstrating that using the proposed measure we achieve significantly betterclusterings of ARIMA time-series data as compared to clusterings obtained by using other traditional similaritymeasures, such as DFT, DWT, PCA, etc. Experiments wereperformed both on simulated as well as real data.