A Hierarchical Model for Clustering and Categorising Documents

  • Authors:
  • Éric Gaussier;Cyril Goutte;Kris Popat;Francine Chen

  • Affiliations:
  • -;-;-;-

  • Venue:
  • Proceedings of the 24th BCS-IRSG European Colloquium on IR Research: Advances in Information Retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new hierarchical generative model for textual data, where words may be generated by topic specific distributions at any level in the hierarchy. This model is naturally well-suited to clustering documents in preset or automatically generated hierarchies, as well as categorising new documents in an existing hierarchy. Training algorithms are derived for both cases, and illustrated on real data by clustering news stories and categorising newsgroup messages. Finally, the generative model may be used to derive a Fisher kernel expressing similarity between documents.