Extracting Knowledge from Life Courses: Clustering and Visualization

  • Authors:
  • Nicolas S. Müller;Alexis Gabadinho;Gilbert Ritschard;Matthias Studer

  • Affiliations:
  • Department of Econometrics and Laboratory of Demography, University of Geneva,;Department of Econometrics and Laboratory of Demography, University of Geneva,;Department of Econometrics and Laboratory of Demography, University of Geneva,;Department of Econometrics and Laboratory of Demography, University of Geneva,

  • Venue:
  • DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article presents some of the facilities offered by our TraMineR R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the nmost frequent sequences. TraMineR was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should also be of interest for other kind of sequential data such as DNA analysis or web logs.