A disk-aware algorithm for time series motif discovery

Authors:
Abdullah Mueen;Eamonn Keogh;Qiang Zhu;Sydney S. Cash;M. Brandon Westover;Nima Bigdely-Shamlo
Affiliations:
Department of Computer Science & Engineering, University of California, Riverside, USA;Department of Computer Science & Engineering, University of California, Riverside, USA;Department of Computer Science & Engineering, University of California, Riverside, USA;Massachusetts General Hospital, Harvard Medical School, Boston, USA;Massachusetts General Hospital, Brigham and Women's Hospital, Boston, USA;Swartz Center for Computational Neuroscience, University of California, San Diego, USA
Venue:
Data Mining and Knowledge Discovery
Year:
2011

Citing 32
Cited 3

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Independent component analysis using an extended infomax algorithm for mixed subgaussian and supergaussian sources

Neural Computation
Closest pair queries in spatial databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Multidimensional divide-and-conquer

Communications of the ACM
AlphaSort: a cache-sensitive parallel external sort

The VLDB Journal — The International Journal on Very Large Data Bases
High Dimensional Similarity Joins: Algorithms and Performance Evaluation

IEEE Transactions on Knowledge and Data Engineering
Efficient Color Histogram Indexing for Quadratic Form Distance Functions

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
C2P: Clustering based on Closest Pairs

Proceedings of the 27th International Conference on Very Large Data Bases
Mining Motifs in Massive Time Series Databases

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
High Performance Data Mining Using the Nearest Neighbor Join

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Probabilistic discovery of time series motifs

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovery of Time-Series Motif from Multi-Dimensional Data Based on MDL Principle

Machine Learning
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Learning recurrent behaviors from heterogeneous multivariate time-series

Artificial Intelligence in Medicine
Efficient index-based KNN join processing for high-dimensional data

Information and Software Technology
Detecting time series motifs under uniform scaling

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Knowledge construction from time series data using a collaborative exploration system

Journal of Biomedical Informatics
Declarative querying for biological sequences

Declarative querying for biological sequences
iSAX: indexing and mining terabyte sized time series

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering original motifs with different lengths from time series

Knowledge-Based Systems
Effective Proximity Retrieval by Ordering Permutations

IEEE Transactions on Pattern Analysis and Machine Intelligence
Detecting Subdimensional Motifs: An Efficient Algorithm for Generalized Multivariate Pattern Discovery

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Querying and mining of time series data: experimental comparison of representations and distance measures

Proceedings of the VLDB Endowment
Discovering multivariate motifs using subsequence density estimation and greedy mixture learning

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Motion-motif graphs

Proceedings of the 2008 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Mining approximate motifs in time series

DS'06 Proceedings of the 9th international conference on Discovery Science
Locating motifs in time-series data

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Searching and mining trillions of time series subsequences under dynamic time warping

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012
Data mining a trillion time series subsequences under dynamic time warping

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Time series motifs are sets of very similar subsequences of a long time series. They are of interest in their own right, and are also used as inputs in several higher-level data mining algorithms including classification, clustering, rule-discovery and summarization. In spite of extensive research in recent years, finding time series motifs exactly in massive databases is an open problem. Previous efforts either found approximate motifs or considered relatively small datasets residing in main memory. In this work, we leverage off previous work on pivot-based indexing to introduce a disk-aware algorithm to find time series motifs exactly in multi-gigabyte databases which contain on the order of tens of millions of time series. We have evaluated our algorithm on datasets from diverse areas including medicine, anthropology, computer networking and image processing and show that we can find interesting and meaningful motifs in datasets that are many orders of magnitude larger than anything considered before.