iSAX: indexing and mining terabyte sized time series

Authors:
Jin Shieh;Eamonn Keogh
Affiliations:
University of California, Riverside, Riverside, CA, USA;University of California, Riverside, Riverside, CA, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 9
Cited 44

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Adaptive query processing for time-series data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Using Signature Files for Querying Time-Series Data

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Discovery of climate indices using clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Indexing spatio-temporal trajectories with Chebyshev polynomials

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A Multiresolution Symbolic Representation of Time Series

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Fast time series classification using numerosity reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
Experiencing SAX: a novel symbolic representation of time series

Data Mining and Knowledge Discovery
The TS-tree: efficient time series search and retrieval

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology

DynaMMo: mining and summarization of coevolving sequences with missing values

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Which Distance for the Identification and the Differentiation of Cell-Cycle Expressed Genes?

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Managing massive time series streams with multi-scale compressed trickles

Proceedings of the VLDB Endowment
Distortion-free predictive streaming time-series matching

Information Sciences: an International Journal
Sublinear querying of realistic timeseries and its application to human motion

Proceedings of the international conference on Multimedia information retrieval
A random-periods model for the comparison of a metrics efficiency to classify cell-cycle expressed genes

Pattern Recognition Letters
Modeling synchronized time series

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
Top-k queries on temporal data

The VLDB Journal — The International Journal on Very Large Data Bases
A disk-aware algorithm for time series motif discovery

Data Mining and Knowledge Discovery
Resource-aware ECG analysis on mobile devices

Proceedings of the 2011 ACM Symposium on Applied Computing
A case-study on learning from large-scale intracranial EEG data using multi-core machines and clusters

Proceedings of the Third Workshop on Large Scale Data Mining: Theory and Applications
Scalable kNN search on vertically stored time series

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised discovery of motifs under amplitude scaling and shifting in time series databases

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Scalable similarity search of timeseries with variable dimensionality

Proceedings of the 20th ACM international conference on Information and knowledge management
Similarity matching for uncertain time series: analytical and experimental comparison

Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Querying and Mining Uncertain Spatio-Temporal Data
SciQL: bridging the gap between science and relational DBMS

Proceedings of the 15th Symposium on International Database Engineering & Applications
tsdb: a compressed database for time series

TMA'12 Proceedings of the 4th international conference on Traffic Monitoring and Analysis
SFA: a symbolic fourier approximation and index for similarity search in high dimensional datasets

Proceedings of the 15th International Conference on Extending Database Technology
Significant motifs in time series

Statistical Analysis and Data Mining
Parsimonious temporal aggregation

The VLDB Journal — The International Journal on Very Large Data Bases
Searching and mining trillions of time series subsequences under dynamic time warping

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Ranking large temporal data

Proceedings of the VLDB Endowment
Uncertain time-series similarity: return to the basics

Proceedings of the VLDB Endowment
Time series discord discovery using WAT algorithm and iSAX representation

Proceedings of the Third Symposium on Information and Communication Technology
Model-based integration of past & future in TimeTravel

Proceedings of the VLDB Endowment
Time-series data mining

ACM Computing Surveys (CSUR)
Genetic algorithms-based symbolic aggregate approximation

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
TSX: a novel symbolic representation for financial time series

PRICAI'12 Proceedings of the 12th Pacific Rim international conference on Trends in Artificial Intelligence
Unsupervised mining of long time series based on latent topic model

Neurocomputing
Symbolic representation of smart meter data

Proceedings of the Joint EDBT/ICDT 2013 Workshops
Steeler nation, 12th man, and boo birds: classifying Twitter user interests using time series

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
On preserving statistical characteristics of accelerometry data using their empirical cumulative distribution

Proceedings of the 2013 International Symposium on Wearable Computers
Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012
Searching time series with Hadoop in an electric power company

Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Social life networks: a multimedia problem?

Proceedings of the 21st ACM international conference on Multimedia
MAGIC summoning: towards automatic suggesting and testing of gestures with low probability of false positives during use

The Journal of Machine Learning Research
Time series representation: a random shifting perspective

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Variance-wise segmentation for a temporal-adaptive SAX

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Specialized storage for big numeric time series

HotStorage'13 Proceedings of the 5th USENIX conference on Hot Topics in Storage and File Systems
A data-adaptive and dynamic segmentation index for whole matching on time series

Proceedings of the VLDB Endowment
Finding time series discord based on bit representation clustering

Knowledge-Based Systems
Discovering common motifs in cursor movement data for improving web search

Proceedings of the 7th ACM international conference on Web search and data mining
CUBOD: a customized body gesture design tool for end users

BCS-HCI '13 Proceedings of the 27th International BCS Human Computer Interaction Conference
A new similarity measure based on shape information for invariant with multiple distortions

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, the algorithms and the size of data considered have generally not been representative of the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a novel multi-resolution symbolic representation can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. Our approach allows both fast exact search and ultra fast approximate search. We show how to exploit the combination of both types of search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing millions of time series.