Model-based clustering of high-dimensional data streams with online mixture of probabilistic PCA

  • Authors:
  • Anastasios Bellas;Charles Bouveyron;Marie Cottrell;Jérôme Lacaille

  • Affiliations:
  • SAMM (EA 4543), Université Paris 1, Paris Cedex 13, France 75634;SAMM (EA 4543), Université Paris 1, Paris Cedex 13, France 75634;SAMM (EA 4543), Université Paris 1, Paris Cedex 13, France 75634;Snecma, Groupe Safran, Moissy Cramayel, France 77550

  • Venue:
  • Advances in Data Analysis and Classification
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Model-based clustering is a popular tool which is renowned for its probabilistic foundations and its flexibility. However, model-based clustering techniques usually perform poorly when dealing with high-dimensional data streams, which are nowadays a frequent data type. To overcome this limitation of model-based clustering, we propose an online inference algorithm for the mixture of probabilistic PCA model. The proposed algorithm relies on an EM-based procedure and on a probabilistic and incremental version of PCA. Model selection is also considered in the online setting through parallel computing. Numerical experiments on simulated and real data demonstrate the effectiveness of our approach and compare it to state-of-the-art online EM-based algorithms.