State-space dynamics distance for clustering sequential data

  • Authors:
  • Darío García-García;Emilio Parrado-Hernández;Fernando Diaz-de-Maria

  • Affiliations:
  • Signal Theory and Communications Deparment, Escuela Politécnica Superior, Universidad Carlos III de Madrid, Avda. de la Universidad, 30, 28911 Leganés, Spain;Signal Theory and Communications Deparment, Escuela Politécnica Superior, Universidad Carlos III de Madrid, Avda. de la Universidad, 30, 28911 Leganés, Spain;Signal Theory and Communications Deparment, Escuela Politécnica Superior, Universidad Carlos III de Madrid, Avda. de la Universidad, 30, 28911 Leganés, Spain

  • Venue:
  • Pattern Recognition
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper proposes a novel similarity measure for clustering sequential data. We first construct a common state space by training a single probabilistic model with all the sequences in order to get a unified representation for the dataset. Then, distances are obtained attending to the transition matrices induced by each sequence in that state space. This approach solves some of the usual overfitting and scalability issues of the existing semi-parametric techniques that rely on training a model for each sequence. Empirical studies on both synthetic and real-world datasets illustrate the advantages of the proposed similarity measure for clustering sequences.