Clustering of time-series subsequences is meaningless: implications for previous and future research

Authors:
Eamonn Keogh;Jessica Lin
Affiliations:
Computer Science & Engineering Department, University of California––Riverside, 92521, Riverside, CA, USA;Computer Science & Engineering Department, University of California––Riverside, 92521, Riverside, CA, USA
Venue:
Knowledge and Information Systems
Year:
2005

Citing 0
Cited 27

Visually mining and monitoring massive time series

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Visualizing and discovering non-trivial patterns in large time series databases

Information Visualization
In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient query filtering for streaming time series with applications to semisupervised learning of time series classifiers

Knowledge and Information Systems
Numerical time-series pattern extraction based on irregular piecewise aggregate approximation and gradient specification

New Generation Computing
Mining sequential patterns across time sequences

New Generation Computing
Temporal pattern matching for the prediction of stock prices

AIDM '07 Proceedings of the 2nd international workshop on Integrating artificial intelligence and data mining - Volume 84
Advances in clustering and visualization of time series using GTM through time

Neural Networks
Characterizing individual communication patterns

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
On privacy in time series data mining

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
DUST: a generalized notion of similarity between uncertain time series

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
DUST: a generalized notion of similarity between uncertain time series

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A new class of attacks on time series data mining\m{1}

Intelligent Data Analysis
Lag patterns in time series databases

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Increasing availability of industrial systems through data stream mining

Computers and Industrial Engineering
Weighted dynamic time warping for time series classification

Pattern Recognition
Traffic events modeling for structural health monitoring

IDA'11 Proceedings of the 10th international conference on Advances in intelligent data analysis X
Visual data mining for identification of patterns and outliers in weather stations' data

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Short communication: Selective Subsequence Time Series clustering

Knowledge-Based Systems
Time-series data mining

ACM Computing Surveys (CSUR)
Feature selection for classification of oscillating time series

Expert Systems: The Journal of Knowledge Engineering
Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

International Journal of Data Warehousing and Mining
Preserving Privacy in Time Series Data Mining

International Journal of Data Warehousing and Mining
Weighted spherical 1-mean with phase shift and its application in electrocardiogram discord detection

Artificial Intelligence in Medicine
DTW-D: time series semi-supervised learning from a single example

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Time series symbolization and search for frequent patterns

Proceedings of the Fourth Symposium on Information and Communication Technology
A new similarity measure based on shape information for invariant with multiple distortions

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given the recent explosion of interest in streaming data and online algorithms, clustering of time-series subsequences, extracted via a sliding window, has received much attention. In this work, we make a surprising claim. Clustering of time-series subsequences is meaningless. More concretely, clusters extracted from these time series are forced to obey a certain constraint that is pathologically unlikely to be satisfied by any dataset, and because of this, the clusters extracted by any clustering algorithm are essentially random. While this constraint can be intuitively demonstrated with a simple illustration and is simple to prove, it has never appeared in the literature. We can justify calling our claim surprising because it invalidates the contribution of dozens of previously published papers. We will justify our claim with a theorem, illustrative examples, and a comprehensive set of experiments on reimplementations of previous work. Although the primary contribution of our work is to draw attention to the fact that an apparent solution to an important problem is incorrect and should no longer be used, we also introduce a novel method that, based on the concept of time-series motifs, is able to meaningfully cluster subsequences on some time-series datasets.