Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data

  • Authors:
  • Thanawin Rakthanmanon;Eamonn J. Keogh;Stefano Lonardi;Scott Evans

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given the pervasiveness of time series data in all human endeavors, and the ubiquity of clustering as a data mining application, it is somewhat surprising that the problem of time series clustering from a single stream remains largely unsolved. Most work on time series clustering considers the clustering of individual time series, e.g., gene expression profiles, individual heartbeats or individual gait cycles. The few attempts at clustering time series streams have been shown to be objectively incorrect in some cases, and in other cases shown to work only on the most contrived datasets by carefully adjusting a large set of parameters. In this work, we make two fundamental contributions. First, we show that the problem definition for time series clustering from streams currently used is inherently flawed, and a new definition is necessary. Second, we show that the Minimum Description Length (MDL) framework offers an efficient, effective and essentially parameter-free method for time series clustering. We show that our method produces objectively correct results on a wide variety of datasets from medicine, zoology and industrial process analyses.