Experimental comparison of representation methods and distance measures for time series data

Authors:
Xiaoyue Wang;Abdullah Mueen;Hui Ding;Goce Trajcevski;Peter Scheuermann;Eamonn Keogh
Affiliations:
University of California Riverside, Riverside, USA;University of California Riverside, Riverside, USA;Northwestern University, Evanston, USA;Northwestern University, Evanston, USA;Northwestern University, Evanston, USA;University of California Riverside, Riverside, USA
Venue:
Data Mining and Knowledge Discovery
Year:
2013

Citing 44
Cited 3

Temporal databases: theory, design, and implementation

Temporal databases: theory, design, and implementation
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A comparison of DFT and DWT based similarity search in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
On Comparing Classifiers: Pitfalls toAvoid and a Recommended Approach

Data Mining and Knowledge Discovery
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Using Signature Files for Querying Time-Series Data

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Pattern Extraction for Time Series Classification

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases

Proceedings of the 17th International Conference on Data Engineering
A Similarity Search Method of Time Series Data with Combination of Fourier and Wavelet Transforms

TIME '02 Proceedings of the Ninth International Symposium on Temporal Representation and Reasoning (TIME'02)
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
Warping indexes with envelope transforms for query by humming

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Discovering Similar Multidimensional Trajectories

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Discovery of climate indices using clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A unifying semantics for time and events

Artificial Intelligence - Special issue on logical formalizations and commonsense reasoning
Indexing spatio-temporal trajectories with Chebyshev polynomials

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
An Advanced Segmental Semi-Markov Model Based Online Series Pattern Detection

ICPR '04 Proceedings of the Pattern Recognition, 17th International Conference on (ICPR'04) Volume 3 - Volume 03
Exact indexing of dynamic time warping

Knowledge and Information Systems
FTW: fast similarity search under the time warping distance

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Robust and fast similarity search for moving object trajectories

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Online and Offline Character Recognition Using Alignment to Prototypes

ICDAR '05 Proceedings of the Eighth International Conference on Document Analysis and Recognition
Using multi-scale histograms to answer pattern existence and shape match queries

SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Fast time series classification using numerosity reduction

ICML '06 Proceedings of the 23rd international conference on Machine learning
Indexing Multidimensional Time-Series

The VLDB Journal — The International Journal on Very Large Data Bases
A decade of progress in indexing and mining large time series databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient human motion retrieval in large databases

Proceedings of the 4th international conference on Computer graphics and interactive techniques in Australasia and Southeast Asia
An efficient and accurate method for evaluating time series similarity

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Experiencing SAX: a novel symbolic representation of time series

Data Mining and Knowledge Discovery
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
On the marriage of Lp-norms and edit distance

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Indexable PLA for efficient similarity search

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Evaluation of similarity searching methods for music data in P2P networks

International Journal of Business Intelligence and Data Mining
Trajectory retrieval with latent semantic analysis

Proceedings of the 2008 ACM symposium on Applied computing
Faster retrieval with a two-pass dynamic-time-warping lower bound

Pattern Recognition
Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures

The VLDB Journal — The International Journal on Very Large Data Bases
A Dispersion-Based PAA Representation for Time Series

CSIE '09 Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 04
A study of cross-validation and bootstrap for accuracy estimation and model selection

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Anticipatory DTW for efficient similarity search in time series databases

Proceedings of the VLDB Endowment
Efficient search in document image collections

ACCV'07 Proceedings of the 8th Asian conference on Computer vision - Volume Part I
Similarity search in multimedia time series data using amplitude-level features

MMM'08 Proceedings of the 14th international conference on Advances in multimedia modeling

Dimensionality reduction via isomap with lock-step and elastic measures for time series gene expression classification

EvoBIO'13 Proceedings of the 11th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Finding time series discord based on bit representation clustering

Knowledge-Based Systems
Stock market co-movement assessment using a three-phase clustering method

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The previous decade has brought a remarkable increase of the interest in applications that deal with querying and mining of time series data. Many of the research efforts in this context have focused on introducing new representation methods for dimensionality reduction or novel similarity measures for the underlying data. In the vast majority of cases, each individual work introducing a particular method has made specific claims and, aside from the occasional theoretical justifications, provided quantitative experimental observations. However, for the most part, the comparative aspects of these experiments were too narrowly focused on demonstrating the benefits of the proposed methods over some of the previously introduced ones. In order to provide a comprehensive validation, we conducted an extensive experimental study re-implementing eight different time series representations and nine similarity measures and their variants, and testing their effectiveness on 38 time series data sets from a wide variety of application domains. In this article, we give an overview of these different techniques and present our comparative experimental findings regarding their effectiveness. In addition to providing a unified validation of some of the existing achievements, our experiments also indicate that, in some cases, certain claims in the literature may be unduly optimistic.