Decoding Substitution Ciphers by Means of Word Matching with Application to OCR
IEEE Transactions on Pattern Analysis and Machine Intelligence
Algorithms for clustering data
Algorithms for clustering data
Vector quantization and signal compression
Vector quantization and signal compression
Searching Multimedia Databases by Content
Searching Multimedia Databases by Content
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation
CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Clustering of time-series subsequences is meaningless: implications for previous and future research
Knowledge and Information Systems
Clustering Ensembles: Models of Consensus and Weak Partitions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Making Subsequence Time Series Clustering Meaningful
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Pattern Recognition, Third Edition
Pattern Recognition, Third Edition
Why does subsequence time-series clustering produce sine waves?
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Useful clustering outcomes from meaningful time series clustering
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Data mining of vector–item patterns using neighborhood histograms
Knowledge and Information Systems
Post-processing in wireless sensor networks: benchmarking sensor trace files
Proceedings of the 7th ACM workshop on Performance evaluation of wireless ad hoc, sensor, and ubiquitous networks
Data Mining and Knowledge Discovery
Journal of Network and Computer Applications
Hi-index | 0.00 |
Recent papers have claimed that the result of K-means clustering for time series subsequences (STS clustering) is independent of the time series that created it. Our paper revisits this claim. In particular, we consider the following question: Given several time series sequences and a set of STS cluster centroids from one of them (generated by the K-means algorithm), is it possible to reliably determine which of the sequences produced these cluster centroids? While recent results suggest that the answer should be NO, we answer this question in the affirmative.We present cluster shape distance, an alternate distance measure for time series subsequence clusters, based on cluster shapes. Given a set of clusters, its shape is the sorted list of the pairwise Euclidean distances between their centroids. We then present two algorithms based on this distance measure, which match a set of STS cluster centroids with the time series that produced it. While the first algorithm creates DQG reuse this term more smaller "fingerprints" for the sequences, the second is more accurate. In our experiments with a dataset of 10 sequences, it produced a correct match 100% of the time.Furthermore, we offer an analysis that explains why our cluster shape distance provides a reliable way to match STS clusters to the original sequences, whereas cluster set distance fails to do so. Our work establishes for the first time a strong relation between the result of K-means STS clustering and the time series sequence that created it, despite earlier predictions that this is not possible.