Mixture model and MDSDCA for textual data

  • Authors:
  • Faryel Allouti;Mohamed Nadif;Le Thi Hoai An;Benoît Otjacques

  • Affiliations:
  • LIPADE, UFR MI, Paris Descartes University, Paris, France;LIPADE, UFR MI, Paris Descartes University, Paris, France;LITA, UFR MIM, Paul Verlaine University of Metz, Metz, France;Public Research Center-Gabriel Lippmann, Informatics, Systems and Collaboration Department, Belvaux, Luxembourg

  • Venue:
  • CDVE'09 Proceedings of the 6th international conference on Cooperative design, visualization, and engineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

E-mailing has become an essential component of cooperation in business. Consequently, the large number of messages manually produced or automatically generated can rapidly cause information overflow for users. Many research projects have examined this issue but surprisingly few have tackled the problem of the files attached to e-mails that, in many cases, contain a substantial part of the semantics of the message. This paper considers this specific topic and focuses on the problem of clustering and visualization of attached files. Relying on the multinomial mixture model, we used the Classification EM algorithm (CEM) to cluster the set of files, and MDSDCA to visualize the obtained classes of documents. Like the Multidimensional Scaling method, the aim of the MDSDCA algorithm based on the Difference of Convex functions is to optimize the stress criterion. As MDSDCA is iterative, we propose an initialization approach to avoid starting with random values. Experiments are investigated using simulations and textual data.