Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

Authors:
Eamonn Keogh;Jessica Lin;Wagner Truppel
Affiliations:
-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 22
Cited 51

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
MALM: a framework for mining sequence database at multiple abstraction levels

Proceedings of the seventh international conference on Information and knowledge management
Identifying distinctive subsequences in multivariate time series by clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the stock market (extended abstract): which measure is best?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A new approach to analyzing gene expression time series data

Proceedings of the sixth annual international conference on Computational biology
On Clustering Validation Techniques

Journal of Intelligent Information Systems
Mining of Moving Objects from Time-Series Images and its Application to Satellite Weather Imagery

Journal of Intelligent Information Systems
A Survey of Temporal Knowledge Discovery Paradigms and Methods

IEEE Transactions on Knowledge and Data Engineering
Classification Rules + Time = Temporal Rules

ICCS '02 Proceedings of the International Conference on Computational Science-Part I
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Combining the Self-Organizing Map and K-Means Clustering for On-Line Classification of Sensor Data

ICANN '01 Proceedings of the International Conference on Artificial Neural Networks
Indexing and Mining of the Local Patterns in Sequence Database

IDEAL '02 Proceedings of the Third International Conference on Intelligent Data Engineering and Automated Learning
Discovering Sequential Association Rules with Constraints and Time Lags in Multiple Sequences

ISMIS '02 Proceedings of the 13th International Symposium on Foundations of Intelligent Systems
Distribution Discovery: Local Analysis of Temporal Rules

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Motion Recognition Method by Using Primitive Motions

VDB 5 Proceedings of the Fifth Working Conference on Visual Database Systems: Advances in Visual Information Management
Extraction of Primitive Motion and Discovery of Association Rules from Human Motion Data

Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Data snooping, dredging and fishing: the dark side of data mining a SIGKDD99 panel report

ACM SIGKDD Explorations Newsletter
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Cost-efficient mining techniques for data streams

ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data streams: a review

ACM SIGMOD Record
Making Subsequence Time Series Clustering Meaningful

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Kernel-Density-Based Clustering of Time Series Subsequences Using a Continuous Random-Walk Noise Model

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Adaptive Clustering for Multiple Evolving Streams

IEEE Transactions on Knowledge and Data Engineering
Unfolding preprocessing for meaningful time series clustering

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
In search of meaning for time series subsequence clustering: matching algorithms based on a new distance measure

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Compression-based data mining of sequential data

Data Mining and Knowledge Discovery
Making clustering in delay-vector space meaningful

Knowledge and Information Systems
Clustering approach to quantify long-term spatio-temporal interactions in epileptic intracranial electroencephalography

Computational Intelligence and Neuroscience - EEG/MEG Signal Processing
A fuzzy approach for interpretation of ubiquitous data stream clustering and its application in road safety

Intelligent Data Analysis - Knowlegde Discovery from Data Streams
Useful clustering outcomes from meaningful time series clustering

AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Discovering original motifs with different lengths from time series

Knowledge-Based Systems
Clustering Streaming Time Series Using CBC

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part III: ICCS 2007
An efficient stream mining technique

WSEAS Transactions on Information Science and Applications
Establishing relationships among patterns in stock market data

Data & Knowledge Engineering
An efficient time series data mining technique

ICCOMP'08 Proceedings of the 12th WSEAS international conference on Computers
On-line motif detection in time series with SwiftMotif

Pattern Recognition
Subspace sums for extracting non-random data from massive noise

Knowledge and Information Systems
Cluster-based genetic segmentation of time series with DWT

Pattern Recognition Letters
Compensation of Translational Displacement in Time Series Clustering Using Cross Correlation

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Discovering multivariate motifs using subsequence density estimation and greedy mixture learning

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
HE-Tree: a framework for detecting changes in clustering structure for categorical data streams

The VLDB Journal — The International Journal on Very Large Data Bases
Data mining of vector–item patterns using neighborhood histograms

Knowledge and Information Systems
Translational symmetry in subsequence time-series clustering

JSAI'06 Proceedings of the 20th annual conference on New frontiers in artificial intelligence
A tree-construction search approach for multivariate time series motifs discovery

Pattern Recognition Letters
A novel two-level clustering method for time series data analysis

Expert Systems with Applications: An International Journal
Point-distribution algorithm for mining vector-item patterns

Proceedings of the ACM SIGKDD Workshop on Useful Patterns
Temporal data mining using shape space representations of time series

Neurocomputing
A review on time series data mining

Engineering Applications of Artificial Intelligence
A computational intelligence based framework for one-subsequence-ahead forecasting of nonstationary time series

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Identifying predictive multi-dimensional time series motifs: an application to severe weather prediction

Data Mining and Knowledge Discovery
A case-study on learning from large-scale intracranial EEG data using multi-core machines and clusters

Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Pattern recognition in multivariate time series: dissertation proposal

Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
Why does subsequence time-series clustering produce sine waves?

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Dimension reduction for clustering time series using global characteristics

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part III
Recent advances in mining time series data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A fast compression-based similarity measure with applications to content-based image retrieval

Journal of Visual Communication and Image Representation
A novel mining algorithm for periodic clustering sequential patterns

IEA/AIE'06 Proceedings of the 19th international conference on Advances in Applied Artificial Intelligence: industrial, Engineering and Other Applications of Applied Intelligent Systems
Recent advances in mining time series data

ECML'05 Proceedings of the 16th European conference on Machine Learning
Preventing meaningless stock time series pattern discovery by changing perceptually important point detection

FSKD'05 Proceedings of the Second international conference on Fuzzy Systems and Knowledge Discovery - Volume Part I
Clustering time-series medical databases based on the improved multiscale matching

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Locating motifs in time-series data

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Accurate symbolization of time series

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Deductive and inductive reasoning on spatio-temporal data

INAP'04/WLP'04 Proceedings of the 15th international conference on Applications of Declarative Programming and Knowledge Management, and 18th international conference on Workshop on Logic Programming
Clustering distributed data streams in peer-to-peer environments

Information Sciences: an International Journal
Time-series data mining

ACM Computing Surveys (CSUR)
Modeling topic trends on the social web using temporal signatures

Proceedings of the twelfth international workshop on Web information and data management
Real time processing of data from patient biodevices

HIKM '11 Proceedings of the Fourth Australasian Workshop on Health Informatics and Knowledge Management - Volume 120
Anomalous event detection on large-scale GPS data from mobile phones using hidden markov model and cloud platform

Proceedings of the 2013 ACM conference on Pervasive and ubiquitous computing adjunct publication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Time series data is perhaps the most frequently encountered typeof data examined by the data mining community. Clustering isperhaps the most frequently used data mining algorithm, beinguseful in it's own right as an exploratory technique, and also as asubroutine in more complex data mining algorithms such as rulediscovery, indexing, summarization, anomaly detection, andclassification. Given these two facts, it is hardly surprising thattime series clustering has attracted much attention. The data to beclustered can be in one of two formats: many individual timeseries, or a single time series, from which individual time seriesare extracted with a sliding window. Given the recent explosion ofinterest in streaming data and online algorithms, the latter casehas received much attention.In this work we make an amazing claim. Clustering of streamingtime series is completely meaningless. More concretely, clustersextracted from streaming time series are forced to obey a certainconstraint that is pathologically unlikely to be satisfied by anydataset, and because of this, the clusters extracted by anyclustering algorithm are essentially random. While this constraintcan be intuitively demonstrated with a simple illustration and issimple to prove, it has never appeared in the literature.We can justify calling our claim surprising, since it invalidatesthe contribution of dozens of previously published papers. We willjustify our claim with a theorem, illustrative examples, and acomprehensive set of experiments on reimplementations ofprevious work.