On the need for time series data mining benchmarks: a survey and empirical demonstration

Authors:
Eamonn Keogh;Shruti Kasetty
Affiliations:
University of California - Riverside, Riverside, CA;University of California - Riverside, Riverside, CA
Venue:
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2002

Citing 44
Cited 117

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Finding patterns in time series: a dynamic programming approach

Advances in knowledge discovery and data mining
Matching and indexing sequences of different lengths

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Supporting fast search in time series for movement patterns in multiple scales

Proceedings of the seventh international conference on Information and knowledge management
MALM: a framework for mining sequence database at multiple abstraction levels

Proceedings of the seventh international conference on Information and knowledge management
A fast projection algorithm for sequence data searching

Data & Knowledge Engineering - Special issue: next generation information technologies and systems
Fast time-series searching with scaling and shifting

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Event detection from time series data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive query processing for time-series data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Deformable Markov model templates for time-series pattern matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining the stock market (extended abstract): which measure is best?

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Discovering similar patterns in time series

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Supporting subseries nearest neighbor search via approximation

Proceedings of the ninth international conference on Information and knowledge management
Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
A comparison of DFT and DWT based similarity search in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
Segment-based approach for subsequence searches in sequence databases

Proceedings of the 2001 ACM symposium on Applied computing
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Efficient and robust feature extraction and pattern matching of time series by a lattice structure

Proceedings of the tenth international conference on Information and knowledge management
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Approximate Queries and Representations for Large Data Sequences

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Variable Length Queries for Time Series Data

Proceedings of the 17th International Conference on Data Engineering
Approximate Nearest Neighbor Searching in Multimedia Databases

Proceedings of the 17th International Conference on Data Engineering
Distance Measures for Effective Clustering of ARIMA Time-Series

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Finding Similar Time Series

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Using Signature Files for Querying Time-Series Data

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
The Haar Wavelet Transform in the Time Series Similarity Paradigm

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
FALCON: Feedback Adaptive Loop for Content-Based Retrieval

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Querying Shapes of Histories

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Fast Retrieval of Similar Subsequences in Long Sequence Databases

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
A Signature Technique for Similarity-Based Queries

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Supporting Content-Based Searches on Time Series via Approximation

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
TSA-Tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries on Time-Series Data

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
On Similarity-Based Queries for Time Series Data

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Similarity Search for Multidimensional Data Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Similarity Search Over Time-Series Data Using Wavelets

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An Efficient Index Structure for Shift and Scale Invariant Search of Multi-Attribute Time Sequences

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Warping indexes with envelope transforms for query by humming

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering of streaming time series is meaningless

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Indexing multi-dimensional time-series with support for multiple distance measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic discovery of time series motifs

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online Amnesic Approximation of Streaming Time Series

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A novel technique for indexing video surveillance data

IWVS '03 First ACM SIGMM international workshop on Video surveillance
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Visually mining and monitoring massive time series

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Symbolic representation and retrieval of moving object trajectories

Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval
Distance-function design and fusion for sequence data

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Bounded similarity querying for time-series data

Information and Computation - Special issue: Commemorating the 50th birthday anniversary of Paris C. Kanellakis
Robust and fast similarity search for moving object trajectories

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Optimizing time series discretization for knowledge discovery

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Combining proactive and reactive predictions for data streams

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Classifying spatiotemporal object trajectories using unsupervised learning of basis function coefficients

Proceedings of the third ACM international workshop on Video surveillance & sensor networks
Visualizing and discovering non-trivial patterns in large time series databases

Information Visualization
HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Atomic Wedgie: Efficient Query Filtering for Streaming Times Series

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Integrating Hidden Markov Models and Spectral Analysis for Sensory Time Series Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Structural Periodic Measures for Time-Series Data

Data Mining and Knowledge Discovery
Pattern Discovery of Fuzzy Time Series for Financial Prediction

IEEE Transactions on Knowledge and Data Engineering
Indexing Multidimensional Time-Series

The VLDB Journal — The International Journal on Very Large Data Bases
Semi-supervised time series classification

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Online clustering of parallel data streams

Data & Knowledge Engineering
LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Quantizing time series for efficient similarity search under time warping

ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
Quantizing time series for efficient subsequence matching

DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Compression-based data mining of sequential data

Data Mining and Knowledge Discovery
Efficient query filtering for streaming time series with applications to semisupervised learning of time series classifiers

Knowledge and Information Systems
An efficient and accurate method for evaluating time series similarity

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Experiencing SAX: a novel symbolic representation of time series

Data Mining and Knowledge Discovery
Indexing large human-motion databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
On the marriage of Lp-norms and edit distance

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Mining approximate top-k subspace anomalies in multi-dimensional time-series data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An improved feature extraction technique for high volume time series data

SPPR'07 Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications
On the time series support vector machine using dynamic time warping kernel for brain activity classification

Cybernetics and Systems Analysis
Representing financial time series based on data point importance

Engineering Applications of Artificial Intelligence
The TS-tree: efficient time series search and retrieval

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Toward accurate dynamic time warping in linear time and space

Intelligent Data Analysis
Efficient instance-based learning on data streams

Intelligent Data Analysis
Classification of multivariate time series using two-dimensional singular value decomposition

Knowledge-Based Systems
Classification of multivariate time series using locality preserving projections

Knowledge-Based Systems
Improving Anomaly Detection Event Analysis Using the EventRank Algorithm

AIMS '07 Proceedings of the 1st international conference on Autonomous Infrastructure, Management and Security: Inter-Domain Management
Experiment Databases: Towards an Improved Experimental Methodology in Machine Learning

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Efficiently finding unusual shapes in large image databases

Data Mining and Knowledge Discovery
Disk aware discord discovery: finding unusual time series in terabyte sized datasets

Knowledge and Information Systems
Time series classification based on qualitative space fragmentation

Advanced Engineering Informatics
A new approach to qualitative learning in time series

Expert Systems with Applications: An International Journal
A Quantitative Method for Comparing Multi-Agent-Based Simulations in Feature Space

Multi-Agent-Based Simulation IX
Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series

Neurocomputing
Supporting exact indexing of arbitrarily rotated shapes and periodic time series under Euclidean and warping distance measures

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient discovery of unusual patterns in time series

New Generation Computing
Time series shapelets: a new primitive for data mining

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Finding Structural Similarity in Time Series Data Using Bag-of-Patterns Representation

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Trends Analysis of Topics Based on Temporal Segmentation

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Ambiguous decision trees for mining concept-drifting data streams

Pattern Recognition Letters
Multivariable stream data classification using motifs and their temporal relations

Information Sciences: an International Journal
Parametric kernels for sequence data analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Categorizing classes of signals by means of fuzzy gradual rules

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
SmartFall: an automatic fall detection system based on subsequence matching for the SmartCane

BodyNets '09 Proceedings of the Fourth International Conference on Body Area Networks
Early prediction on time series: a nearest neighbor approach

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Mining closed flexible patterns in time-series databases

Expert Systems with Applications: An International Journal
Neighborhood counting for financial time series forecasting

CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Bounded similarity querying for time-series data

Information and Computation
An improved feature extraction technique for high volume time series data

SPPRA '07 Proceedings of the Fourth IASTED International Conference on Signal Processing, Pattern Recognition, and Applications
Distortion-free predictive streaming time-series matching

Information Sciences: an International Journal
Imprecise modelling using gradual rules and its application to the classification of time series

IFSA'03 Proceedings of the 10th international fuzzy systems association World Congress conference on Fuzzy sets and systems
APCAS: an approximate approach to adaptively segment time series stream

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
An efficient algorithm for instance-based learning on data streams

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications
Association mining of time series dependency

Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
Unraveling complex temporal associations in cellular systems across multiple time-series microarray datasets

Journal of Biomedical Informatics
Rights protection of trajectory datasets with nearest-neighbor preservation

The VLDB Journal — The International Journal on Very Large Data Bases
A brief survey on sequence classification

ACM SIGKDD Explorations Newsletter
Discrete wavelet transform-based time series analysis and mining

ACM Computing Surveys (CSUR)
Application of SOM-based visualization maps for time-response analysis of industrial processes

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Enhancing the symbolic aggregate approximation method using updated lookup tables

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part I
A review on time series data mining

Engineering Applications of Artificial Intelligence
Association mining of dependency between time series using Genetic Algorithm and discretisation

International Journal of Business Intelligence and Data Mining
Time series shapelets: a novel technique that allows accurate, interpretable and fast classification

Data Mining and Knowledge Discovery
Discretization of time series dataset using relative frequency and K-nearest neighbor approach

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
INSIGHT: efficient and effective instance selection for time-series classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Learning actions in complex software systems

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Granulation-based symbolic representation of time series and semi-supervised classification

Computers & Mathematics with Applications
A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences: an International Journal
SciQL: bridging the gap between science and relational DBMS

Proceedings of the 15th Symposium on International Database Engineering & Applications
Effective probability forecasting for time series data using standard machine learning techniques

ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
Development of flexible and adaptable fault detection and diagnosis algorithm for induction motors based on self-organization of feature extraction

KI'05 Proceedings of the 28th annual German conference on Advances in Artificial Intelligence
Group SAX: extending the notion of contrast sets to time series and multimedia data

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Mining approximate motifs in time series

DS'06 Proceedings of the 9th international conference on Discovery Science
A clustering model for mining consumption patterns from imprecise electric load time series data

FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Evaluating performance in continuous context recognition using event-driven error characterisation

LoCA'06 Proceedings of the Second international conference on Location- and Context-Awareness
Recent advances in mining time series data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
A multi-metric index for euclidean and periodic matching

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Recent advances in mining time series data

ECML'05 Proceedings of the 16th European conference on Machine Learning
Wavelets-based clustering of multivariate time series

Fuzzy Sets and Systems
A likelihood ratio distance measure for the similarity between the fourier transform of time series

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
A novel bit level time series representation with implication of similarity search and clustering

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Using relevance feedback to learn both the distance measure and the query in multimedia databases

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Multimedia retrieval using time series representation and relevance feedback

ICADL'05 Proceedings of the 8th international conference on Asian Digital Libraries: implementing strategies and sharing experiences
Visually exploring movement data via similarity-based analysis

Journal of Intelligent Information Systems
Sharing and integration of cognitive neuroscience data: Metric and pattern matching across heterogeneous ERP datasets

Neurocomputing
Detecting potential collusive cliques in futures markets based on trading behaviors from real data

Neurocomputing
Mining temporal patterns in popularity of web items

Information Sciences: an International Journal
PUBCRAWL: protecting users and businesses from CRAWLers

Security'12 Proceedings of the 21st USENIX conference on Security symposium
Time-series data mining

ACM Computing Surveys (CSUR)
Rotation-invariant similarity in time series using bag-of-patterns representation

Journal of Intelligent Information Systems
GDTW-P-SVMs: Variable-length time series analysis using support vector machines

Neurocomputing
Ubiquitous intelligent information push-delivery for personalized content recommendation

UIC'07 Proceedings of the 4th international conference on Ubiquitous Intelligence and Computing
A four-level focus+context approach to interactive visual analysis of temporal features in large scientific data

EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
Case based time series prediction using biased time warp distance for electrical evoked potential forecasting in visual prostheses

Applied Soft Computing
Predicting the social influence of upcoming contents in large social networks

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Instance selection for time series classification based on immune binary particle swarm optimization

Knowledge-Based Systems
Enhancing learning algorithms to support data with short sequence features by automated feature discovery

Knowledge-Based Systems
A data-adaptive and dynamic segmentation index for whole matching on time series

Proceedings of the VLDB Endowment
An approach to dimensionality reduction in time series

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the last decade there has been an explosion of interest in mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details.To illustrate our point, we have undertaken the most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets. Our empirical results strongly support our assertion, and suggest the need for a set of time series benchmarks and more careful empirical evaluation in the data mining community.