Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

Authors:
Thanawin Rakthanmanon;Bilson Campana;Abdullah Mueen;Gustavo Batista;Brandon Westover;Qiang Zhu;Jesin Zakaria;Eamonn Keogh
Affiliations:
University of California Riverside and Kasetsart University;University of California Riverside;University of California Riverside;University of São Paulo;Brigham and Women’s Hospital;University of California Riverside;University of California Riverside;University of California Riverside
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012
Year:
2013

Citing 29
Cited 0

What every computer scientist should know about floating-point arithmetic

ACM Computing Surveys (CSUR)
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases

Proceedings of the 17th International Conference on Data Engineering
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
Indexing multi-dimensional time-series with support for multiple distance measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing Similarity Search for Arbitrary Length Time Series Queries

IEEE Transactions on Knowledge and Data Engineering
Exact indexing of dynamic time warping

Knowledge and Information Systems
FTW: fast similarity search under the time warping distance

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using multiple indexes for efficient subsequence matching in time-series databases

Information Sciences: an International Journal
Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes

Proceedings of the 20th annual ACM symposium on User interface software and technology
Indexing large human-motion databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On the marriage of Lp-norms and edit distance

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The TS-tree: efficient time series search and retrieval

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Scaling and time warping in time series querying

The VLDB Journal — The International Journal on Very Large Data Bases
iSAX: indexing and mining terabyte sized time series

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Querying and mining of time series data: experimental comparison of representations and distance measures

Proceedings of the VLDB Endowment
Gestures are strings: efficient online gesture spotting and classification using string matching

Proceedings of the ICST 2nd international conference on Body area networks
Finding anomalous periodic time series

Machine Learning
Efficient Processing of Warping Time Series Join of Motion Capture Data

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures

The VLDB Journal — The International Journal on Very Large Data Bases
Time series shapelets: a new primitive for data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
TSPad: a Tablet-PC based application for annotation and collaboration on time series data

Proceedings of the 46th Annual Southeast Regional Conference on XX
Enabling Efficient Time Series Analysis for Wearable Activity Data

ICMLA '09 Proceedings of the 2009 International Conference on Machine Learning and Applications
Online discovery and maintenance of time series motifs

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A disk-aware algorithm for time series motif discovery

Data Mining and Knowledge Discovery
Embedding-based subsequence matching in time-series databases

ACM Transactions on Database Systems (TODS)
Identification of ancient coins based on fusion of shape and local features

Machine Vision and Applications
Cardiac arrhythmia detection using dynamic time warping of ECG beats in e-healthcare systems

WOWMOM '11 Proceedings of the 2011 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most time series data mining algorithms use similarity search as a core subroutine, and thus the time taken for similarity search is the bottleneck for virtually all time series data mining algorithms, including classification, clustering, motif discovery, anomaly detection, and so on. The difficulty of scaling a search to large datasets explains to a great extent why most academic work on time series data mining has plateaued at considering a few millions of time series objects, while much of industry and science sits on billions of time series objects waiting to be explored. In this work we show that by using a combination of four novel ideas we can search and mine massive time series for the first time. We demonstrate the following unintuitive fact: in large datasets we can exactly search under Dynamic Time Warping (DTW) much more quickly than the current state-of-the-art Euclidean distance search algorithms. We demonstrate our work on the largest set of time series experiments ever attempted. In particular, the largest dataset we consider is larger than the combined size of all of the time series datasets considered in all data mining papers ever published. We explain how our ideas allow us to solve higher-level time series data mining problems such as motif discovery and clustering at scales that would otherwise be untenable. Moreover, we show how our ideas allow us to efficiently support the uniform scaling distance measure, a measure whose utility seems to be underappreciated, but which we demonstrate here. In addition to mining massive datasets with up to one trillion datapoints, we will show that our ideas also have implications for real-time monitoring of data streams, allowing us to handle much faster arrival rates and/or use cheaper and lower powered devices than are currently possible.