Extracting Knowledge from Life Courses: Clustering and Visualization

Authors:
Nicolas S. Müller;Alexis Gabadinho;Gilbert Ritschard;Matthias Studer
Affiliations:
Department of Econometrics and Laboratory of Demography, University of Geneva,;Department of Econometrics and Laboratory of Demography, University of Geneva,;Department of Econometrics and Laboratory of Demography, University of Geneva,;Department of Econometrics and Laboratory of Demography, University of Geneva,
Venue:
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Year:
2008

Citing 2
Cited 2

Combinatorial Representations of Token Sequences

Journal of Classification
Computational Genome Analysis: An Introduction

Computational Genome Analysis: An Introduction

Career-Path Analysis Using Optimal Matching and Self-Organizing Maps

WSOM '09 Proceedings of the 7th International Workshop on Advances in Self-Organizing Maps
Which dissimilarity is to be used when extracting typologies in sequence analysis? a comparative study

IWANN'13 Proceedings of the 12th international conference on Artificial Neural Networks: advances in computational intelligence - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents some of the facilities offered by our TraMineR R-package for clustering and visualizing sequence data. Firstly, we discuss our implementation of the optimal matching algorithm for evaluating the distance between two sequences and its use for generating a distance matrix for the whole sequence data set. Once such a matrix is obtained, we may use it as input for a cluster analysis, which can be done straightforwardly with any method available in the R statistical environment. Then we present three kinds of plots for visualizing the characteristics of the obtained clusters: an aggregated plot depicting the average sequential behavior of cluster members; an sequence index plot that shows the diversity inside clusters and an original frequency plot that highlights the frequencies of the nmost frequent sequences. TraMineR was designed for analysing sequences representing life courses and our presentation is illustrated on such a real world data set. The material presented should also be of interest for other kind of sequential data such as DNA analysis or web logs.