Summarizing a set of time series by averaging: From Steiner sequence to compact multiple alignment

  • Authors:
  • François Petitjean;Pierre Gançarski

  • Affiliations:
  • -;-

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2012

Quantified Score

Hi-index 5.23

Visualization

Abstract

Summarizing a set of sequences is an old topic that has been revived in the last decade, due to the increasing availability of sequential datasets. The definition of a consensus object is on the center of data analysis issues, since it crystallizes the underlying organization of the data. Dynamic Time Warping (DTW) is currently the most relevant similarity measure between sequences for a large panel of applications, since it makes it possible to capture temporal distortions. In this context, averaging a set of sequences is not a trivial task, since the average sequence has to be consistent with this similarity measure. The Steiner theory and several works in computational biology have pointed out the connection between multiple alignments and average sequences. Taking inspiration from these works, we introduce the notion of compact multiple alignment, which allows us to link these theories to the problem of summarizing under time warping. Having defined the link between the multiple alignment and the average sequence, the second part of this article focuses on the scan of the space of compact multiple alignments in order to provide an average sequence of a set of sequences. We propose to use a genetic algorithm based on a specific representation of the genotype inspired by genes. This representation of the genotype makes it possible to consistently paint the fitness landscape. Experiments carried out on standard datasets show that the proposed approach outperforms existing methods.