Soft memberships for spectral clustering, with application to permeable language distinction

  • Authors:
  • Richard Nock;Pascal Vaillant;Claudia Henry;Frank Nielsen

  • Affiliations:
  • Ceregmia-UFR Droit et Sciences íconomiques, Universite des Antilles-Guyane, Campus de Schoelcher, BP 7209, 97275 Schoelcher, Martinique, France;Département Lettres et Sciences Humaines, Celia/CNRS-Institut d'Enseignement Supérieur de Guyane, BP 792, 97337 Cayenne, Guyane, France;Ceregmia-UFR Droit et Sciences íconomiques, Universite des Antilles-Guyane, Campus de Schoelcher, BP 7209, 97275 Schoelcher, Martinique, France;LIX-Ecole Polytechnique, 91128 Palaiseau Cedex, France

  • Venue:
  • Pattern Recognition
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Recently, a large amount of work has been devoted to the study of spectral clustering-a powerful unsupervised classification method. This paper brings contributions to both its foundations, and its applications to text classification. Departing from the mainstream, concerned with hard membership, we study the extension of spectral clustering to soft membership (probabilistic, EM style) assignments. One of its key features is to avoid the complexity gap of hard membership. We apply this theory to a challenging problem, text clustering for languages having permeable borders, via a novel construction of Markov chains from corpora. Experiments with a readily available code clearly display the potential of the method, which brings a visually appealing soft distinction of languages that may define altogether a whole corpus.