The cluster-abstraction model: unsupervised learning of topic hierarchies from text data

  • Authors:
  • Thomas Hofmann

  • Affiliations:
  • Computer Science Division, UC Berkeley & International CS Institute, Berkeley, CA

  • Venue:
  • IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a novel statistical latent class model for text mining and interactive information access. The described learning architecture, called Cluster-Abstraction Model (CAM), is purely data driven and utilizes contact-specific word occurrence statistics. In an intertwined fashion, the CAM extracts hierarchical relations between groups of documents as well as an abstractive organization of keywords. An annealed version of the Expectation-Maximization (EM) algorithm for maximum likelihood estimation of the model parameters is derived. The benefits of the CAM for interactive retrieval and automated cluster summarization are investigated experimentally.