Data mining a trillion time series subsequences under dynamic time warping

Authors:
Thanawin Rakthanmanon;Bilson Campana;Abdullah Mueen;Gustavo Batista;Brandon Westover;Qiang Zhu;Jesin Zakaria;Eamonn Keogh
Affiliations:
Kasetsart University;UC Riverside;UC Riverside;University of São Paulo;Brigham and Women's Hospital;UC Riverside;UC Riverside;UC Riverside
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 21
Cited 0

What every computer scientist should know about floating-point arithmetic

ACM Computing Surveys (CSUR)
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases

Proceedings of the 17th International Conference on Data Engineering
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
Indexing multi-dimensional time-series with support for multiple distance measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing Similarity Search for Arbitrary Length Time Series Queries

IEEE Transactions on Knowledge and Data Engineering
FTW: fast similarity search under the time warping distance

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes

Proceedings of the 20th annual ACM symposium on User interface software and technology
Scaling and time warping in time series querying

The VLDB Journal — The International Journal on Very Large Data Bases
Querying and mining of time series data: experimental comparison of representations and distance measures

Proceedings of the VLDB Endowment
Gestures are strings: efficient online gesture spotting and classification using string matching

Proceedings of the ICST 2nd international conference on Body area networks
Finding anomalous periodic time series

Machine Learning
Efficient Processing of Warping Time Series Join of Motion Capture Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures

The VLDB Journal — The International Journal on Very Large Data Bases
Time series shapelets: a new primitive for data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Online discovery and maintenance of time series motifs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A disk-aware algorithm for time series motif discovery

Data Mining and Knowledge Discovery
Embedding-based subsequence matching in time-series databases

ACM Transactions on Database Systems (TODS)
Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems

WOWMOM '11 Proceedings of the 2011 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks
Searching and mining trillions of time series subsequences under dynamic time warping

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms. The difficulty of scaling search to large datasets largely explains why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine truly massive time series for the first time. We demonstrate the following extremely unintuitive fact; in large datasets we can exactly search under DTW much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. We show that our ideas allow us to solve higher-level time series data mining problems at scales that would otherwise be untenable.