A Bit Level Representation for Time Series Data Mining with Shape Based Similarity

Authors:
Anthony Bagnall;Chotirat "Ann" Ratanamahatana;Eamonn Keogh;Stefano Lonardi;Gareth Janacek
Affiliations:
School of Computing Sciences, University of East Anglia, Norwich, UK;Department of Computer Engineering, Chulalongkorn University, Chulalongkorn, Thailand;Department of Computer Science and Engineering, University of California, Riverside, USA;Department of Computer Science and Engineering, University of California, Riverside, USA;School of Computing Sciences, University of East Anglia, Norwich, UK
Venue:
Data Mining and Knowledge Discovery
Year:
2006

Citing 28
Cited 17

Dynamic Huffman coding

Journal of Algorithms
Readings in qualitative reasoning about physical systems

Readings in qualitative reasoning about physical systems
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Distributed associative memories for high-speed symbolic reasoning

Fuzzy Sets and Systems - Special issue on connectionist and hybrid connectionist systems for approximate reasoning
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Deformable Markov model templates for time-series pattern matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
The similarity metric

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Mixtures of ARMA Models for Model-Based Time Series Clustering

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
Warping indexes with envelope transforms for query by humming

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A symbolic representation of time series, with implications for streaming algorithms

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
A Comparison of Standard Spell Checking Algorithms and a Novel Binary Neural Approach

IEEE Transactions on Knowledge and Data Engineering
Probabilistic discovery of time series motifs

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Odor-driven attractor dynamics in the antennal lobe allow for simple and rapid olfactory pattern classification

Neural Computation
Weather Data Mining Using Independent Component Analysis

The Journal of Machine Learning Research
Indexing spatio-temporal trajectories with Chebyshev polynomials

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Clustering time series from ARMA models with clipped data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering Time Series with Clipped Data

Machine Learning
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A likelihood ratio distance measure for the similarity between the fourier transform of time series

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Dimensionality reduction for long duration and complex spatio-temporal queries

Proceedings of the 2007 ACM symposium on Applied computing
iSAX: disk-aware mining and indexing of massive time series datasets

Data Mining and Knowledge Discovery
Anomaly detection in radiation sensor data with application to transportation security

IEEE Transactions on Intelligent Transportation Systems
Representing financial time series based on important extrema points

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Exact indexing for massive time series databases under time warping distance

Data Mining and Knowledge Discovery
Shape pattern matching: A tool to cluster unstructured text documents

Journal of Computational Methods in Sciences and Engineering - Special Supplement Issue in Section A and B: Selected Papers from the ISCA International Conference on Software Engineering and Data Engineering, 2009
Efficient algorithm for a novel pattern of time series

Expert Systems with Applications: An International Journal
A review on time series data mining

Engineering Applications of Artificial Intelligence
A novel clustering method on time series data

Expert Systems with Applications: An International Journal
How many reference patterns can improve profitability for real-time trading in futures market?

Expert Systems with Applications: An International Journal
Small gestures go a long way: how many bits per gesture do recognizers actually need?

Proceedings of the Designing Interactive Systems Conference
Model-based integration of past & future in TimeTravel

Proceedings of the VLDB Endowment
Time-series data mining

ACM Computing Surveys (CSUR)
STFMap: query- and feature-driven visualization of large time series data sets

Proceedings of the 21st ACM international conference on Information and knowledge management
The impact of motion dimensionality and bit cardinality on the design of 3D gesture recognizers

International Journal of Human-Computer Studies
Finding time series discord based on bit representation clustering

Knowledge-Based Systems
An approach to dimensionality reduction in time series

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clipping is the process of transforming a real valued series into a sequence of bits representing whether each data is above or below the average. In this paper, we argue that clipping is a useful and flexible transformation for the exploratory analysis of large time dependent data sets. We demonstrate how time series stored as bits can be very efficiently compressed and manipulated and that, under some assumptions, the discriminatory power with clipped series is asymptotically equivalent to that achieved with the raw data. Unlike other transformations, clipped series can be compared directly to the raw data series. We show that this means we can form a tight lower bounding metric for Euclidean and Dynamic Time Warping distance and hence efficiently query by content. Clipped data can be used in conjunction with a host of algorithms and statistical tests that naturally follow from the binary nature of the data. A series of experiments illustrate how clipped series can be used in increasingly complex ways to achieve better results than other popular representations. The usefulness of the proposed representation is demonstrated by the fact that the results with clipped data are consistently better than those achieved with a Wavelet or Discrete Fourier Transformation at the same compression ratio for both clustering and query by content. The flexibility of the representation is shown by the fact that we can take advantage of a variable Run Length Encoding of clipped series to define an approximation of the Kolmogorov complexity and hence perform Kolmogorov based clustering.