Time series shapelets: a novel technique that allows accurate, interpretable and fast classification

Authors:
Lexiang Ye;Eamonn Keogh
Affiliations:
Department of Computer Science and Engineering, University of California, Riverside, USA 92521;Department of Computer Science and Engineering, University of California, Riverside, USA 92521
Venue:
Data Mining and Knowledge Discovery
Year:
2011

Citing 16
Cited 5

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
PCA versus LDA

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
Learning Comprehensible Descriptions of Multivariate Time Series

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Pattern Extraction for Time Series Classification

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic discovery of time series motifs

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Interval and dynamic time warping-based decision trees

Proceedings of the 2004 ACM symposium on Applied computing
Exact indexing of dynamic time warping

Knowledge and Information Systems
Fast time series classification using numerosity reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Lazy Associative Classification

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Experiencing SAX: a novel symbolic representation of time series

Data Mining and Knowledge Discovery
Querying and mining of time series data: experimental comparison of representations and distance measures

Proceedings of the VLDB Endowment
Time series shapelets: a new primitive for data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Dictionary-Based Compression for Long Time-Series Similarity

IEEE Transactions on Knowledge and Data Engineering

A shapelet transform for time series classification

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Alternative quality measures for time series shapelets

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Decision forest: an algorithm for classifying multivariate time series

International Journal of Business Intelligence and Data Mining
Instance selection for time series classification based on immune binary particle swarm optimization

Knowledge-Based Systems
Classification of time series by shapelet transformation

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Classification of time series has been attracting great interest over the past decade. While dozens of techniques have been introduced, recent empirical evidence has strongly suggested that the simple nearest neighbor algorithm is very difficult to beat for most time series problems, especially for large-scale datasets. While this may be considered good news, given the simplicity of implementing the nearest neighbor algorithm, there are some negative consequences of this. First, the nearest neighbor algorithm requires storing and searching the entire dataset, resulting in a high time and space complexity that limits its applicability, especially on resource-limited sensors. Second, beyond mere classification accuracy, we often wish to gain some insight into the data and to make the classification result more explainable, which global characteristics of the nearest neighbor cannot provide. In this work we introduce a new time series primitive, time series shapelets, which addresses these limitations. Informally, shapelets are time series subsequences which are in some sense maximally representative of a class. We can use the distance to the shapelet, rather than the distance to the nearest neighbor to classify objects. As we shall show with extensive empirical evaluations in diverse domains, classification algorithms based on the time series shapelet primitives can be interpretable, more accurate, and significantly faster than state-of-the-art classifiers.