Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data

Authors:
Thanawin Rakthanmanon;Eamonn J. Keogh;Stefano Lonardi;Scott Evans
Affiliations:
-;-;-;-
Venue:
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Year:
2011

Citing 0
Cited 4

Small gestures go a long way: how many bits per gesture do recognizers actually need?

Proceedings of the Designing Interactive Systems Conference
The impact of motion dimensionality and bit cardinality on the design of 3D gesture recognizers

International Journal of Human-Computer Studies
Mining characteristic multi-scale motifs in sensor-based time series

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Quantitative Analysis of Live-Cell Growth at the Shoot Apex of Arabidopsis thaliana: Algorithms for Feature Measurement and Temporal Alignment

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the pervasiveness of time series data in all human endeavors, and the ubiquity of clustering as a data mining application, it is somewhat surprising that the problem of time series clustering from a single stream remains largely unsolved. Most work on time series clustering considers the clustering of individual time series, e.g., gene expression profiles, individual heartbeats or individual gait cycles. The few attempts at clustering time series streams have been shown to be objectively incorrect in some cases, and in other cases shown to work only on the most contrived datasets by carefully adjusting a large set of parameters. In this work, we make two fundamental contributions. First, we show that the problem definition for time series clustering from streams currently used is inherently flawed, and a new definition is necessary. Second, we show that the Minimum Description Length (MDL) framework offers an efficient, effective and essentially parameter-free method for time series clustering. We show that our method produces objectively correct results on a wide variety of datasets from medicine, zoology and industrial process analyses.