iSAX: disk-aware mining and indexing of massive time series datasets

Authors:
Jin Shieh;Eamonn Keogh
Affiliations:
Department of Computer Science & Engineering, University of California, Riverside, USA;Department of Computer Science & Engineering, University of California, Riverside, USA
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 18
Cited 6

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Adaptive query processing for time-series data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Random projection in dimensionality reduction: applications to image and text data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Approximate Queries and Representations for Large Data Sequences

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Using Signature Files for Querying Time-Series Data

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Discovery of climate indices using clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Indexing spatio-temporal trajectories with Chebyshev polynomials

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A Multiresolution Symbolic Representation of Time Series

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Atomic Wedgie: Efficient Query Filtering for Streaming Times Series

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A Bit Level Representation for Time Series Data Mining with Shape Based Similarity

Data Mining and Knowledge Discovery
Fast time series classification using numerosity reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
Experiencing SAX: a novel symbolic representation of time series

Data Mining and Knowledge Discovery
Indexable PLA for efficient similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The TS-tree: efficient time series search and retrieval

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Querying and mining of time series data: experimental comparison of representations and distance measures

Proceedings of the VLDB Endowment

Distortion-free predictive streaming time-series matching

Information Sciences: an International Journal
A review on time series data mining

Engineering Applications of Artificial Intelligence
ERA: efficient serial and parallel suffix tree construction for very long strings

Proceedings of the VLDB Endowment
Revisiting techniques for lowerbounding the dynamic time warping distance

SISAP'12 Proceedings of the 5th international conference on Similarity Search and Applications
Genetic algorithms-based symbolic aggregate approximation

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Monitoring and diagnosing indicators for business analytics

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, the algorithms and the size of data considered have generally not been representative of the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we introduce a novel multi-resolution symbolic representation which can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. To demonstrate the utility of this representation, we constructed a simple tree-based index structure which facilitates fast exact search and orders of magnitude faster, approximate search. For example, with a database of one-hundred million time series, the approximate search can retrieve high quality nearest neighbors in slightly over a second, whereas a sequential scan would take tens of minutes. Our experimental evaluation demonstrates that our representation allows index performance to scale well with increasing dataset sizes. Additionally, we provide analysis concerning parameter sensitivity, approximate search effectiveness, and lower bound comparisons between time series representations in a bit constrained environment. We further show how to exploit the combination of both exact and approximate search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing tens of millions of time series.