Summarizing a set of time series by averaging: From Steiner sequence to compact multiple alignment

Authors:
François Petitjean;Pierre Gançarski
Affiliations:
-;-
Venue:
Theoretical Computer Science
Year:
2012

Citing 12
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
A new approach to analyzing gene expression time series data

Proceedings of the sixth annual international conference on Computational biology
Lamarckian Evolution, The Baldwin Effect and Function Optimization

PPSN III Proceedings of the International Conference on Evolutionary Computation. The Third Conference on Parallel Problem Solving from Nature: Parallel Problem Solving from Nature
Coevolutionary Life-Time Learning

PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
Exact indexing of dynamic time warping

Knowledge and Information Systems
Off-line signature verification using DTW

Pattern Recognition Letters
PROMALS

Bioinformatics
Scaling and time warping in time series querying

The VLDB Journal — The International Journal on Very Large Data Bases
Inaccuracies of Shape Averaging Method Using Dynamic Time Warping for Time Series Data

ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Contrast enhanced dynamic time warping distance for time series shape averaging classification

Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
A global averaging method for dynamic time warping, with applications to clustering

Pattern Recognition

Quantified Score

Hi-index	5.23

Visualization

Abstract

Summarizing a set of sequences is an old topic that has been revived in the last decade, due to the increasing availability of sequential datasets. The definition of a consensus object is on the center of data analysis issues, since it crystallizes the underlying organization of the data. Dynamic Time Warping (DTW) is currently the most relevant similarity measure between sequences for a large panel of applications, since it makes it possible to capture temporal distortions. In this context, averaging a set of sequences is not a trivial task, since the average sequence has to be consistent with this similarity measure. The Steiner theory and several works in computational biology have pointed out the connection between multiple alignments and average sequences. Taking inspiration from these works, we introduce the notion of compact multiple alignment, which allows us to link these theories to the problem of summarizing under time warping. Having defined the link between the multiple alignment and the average sequence, the second part of this article focuses on the scan of the space of compact multiple alignments in order to provide an average sequence of a set of sequences. We propose to use a genetic algorithm based on a specific representation of the genotype inspired by genes. This representation of the genotype makes it possible to consistently paint the fitness landscape. Experiments carried out on standard datasets show that the proposed approach outperforms existing methods.