Visualization of Structured Data via Generative Probabilistic Modeling

  • Authors:
  • Nikolaos Gianniotis;Peter Tiňo

  • Affiliations:
  • School of Computer Science, University of Birmingham, Birmingham, United Kingdom B15 2TT;School of Computer Science, University of Birmingham, Birmingham, United Kingdom B15 2TT

  • Venue:
  • Similarity-Based Clustering
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a generative probabilistic approach to constructing topographic maps of sequences and tree-structured data. The model formulation specifies a low-dimensional manifold of local noise models on the structured data. The manifold of noise models is induced by a smooth mapping from a low dimensional Euclidean latent space to the parameter space of local noise models. In this paper, we consider noise models endowed with hidden Markovian state space structure, namely Hidden Markov Tree Models (HMTM) and Hidden Markov Models (HMM). Compared with recursive extensions of the traditional Self-Organizing Map that can be used to visualize sequential or tree-structured data, topographic maps formulated within this framework possess a number of advantages such as a well defined cost function that drives the model optimization, the ability to test for overfitting and the accommodation of alternative local noise models implicitly expressing different notions of structured data similarity. Additionally, using information geometry one can calculate magnification factors on the constructed topographic maps. Magnification factors are a useful tool for cluster detection in non-linear topographic map formulations. We demonstrate the framework on two artificial data sets and chorals by J.S. Bach represented as sequences, as well as on images represented as trees.