In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure

Authors:
Dina Goldin;Ricardo Mardales;George Nagy
Affiliations:
Brown University, Providence, RI;University of Connecticut, Storrs, CT;Rensselaer Polytechnic Inst., Troy, NY
Venue:
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Year:
2006

Citing 13
Cited 5

Decoding Substitution Ciphers by Means of Word Matching with Application to OCR

IEEE Transactions on Pattern Analysis and Machine Intelligence
Algorithms for clustering data

Algorithms for clustering data
Vector quantization and signal compression

Vector quantization and signal compression
Searching Multimedia Databases by Content

Searching Multimedia Databases by Content
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Clustering of time-series subsequences is meaningless: implications for previous and future research

Knowledge and Information Systems
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Making Subsequence Time Series Clustering Meaningful

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Kernel-Density-Based Clustering of Time Series Subsequences Using a Continuous Random-Walk Noise Model

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Pattern Recognition, Third Edition

Pattern Recognition, Third Edition
Why does subsequence time-series clustering produce sine waves?

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Useful clustering outcomes from meaningful time series clustering

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Data mining of vector–item patterns using neighborhood histograms

Knowledge and Information Systems
Post-processing in wireless sensor networks: benchmarking sensor trace files

Proceedings of the 7th ACM workshop on Performance evaluation of wireless ad hoc, sensor, and ubiquitous networks
Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction

Data Mining and Knowledge Discovery
Post-processing in wireless sensor networks: Benchmarking sensor trace files for in-network data aggregation

Journal of Network and Computer Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent papers have claimed that the result of K-means clustering for time series subsequences (STS clustering) is independent of the time series that created it. Our paper revisits this claim. In particular, we consider the following question: Given several time series sequences and a set of STS cluster centroids from one of them (generated by the K-means algorithm), is it possible to reliably determine which of the sequences produced these cluster centroids? While recent results suggest that the answer should be NO, we answer this question in the affirmative.We present cluster shape distance, an alternate distance measure for time series subsequence clusters, based on cluster shapes. Given a set of clusters, its shape is the sorted list of the pairwise Euclidean distances between their centroids. We then present two algorithms based on this distance measure, which match a set of STS cluster centroids with the time series that produced it. While the first algorithm creates DQG reuse this term more smaller "fingerprints" for the sequences, the second is more accurate. In our experiments with a dataset of 10 sequences, it produced a correct match 100% of the time.Furthermore, we offer an analysis that explains why our cluster shape distance provides a reliable way to match STS clusters to the original sequences, whereas cluster set distance fails to do so. Our work establishes for the first time a strong relation between the result of K-means STS clustering and the time series sequence that created it, despite earlier predictions that this is not possible.