Exploring topic coherence over many models and many topics

  • Authors:
  • Keith Stevens;Philip Kegelmeyer;David Andrzejewski;David Buttler

  • Affiliations:
  • University of California, Los Angeles, California and Lawrence Livermore National Lab, Livermore, California;Sandia National Lab, Livermore, California;Lawrence Livermore National Lab, Livermore, California;Lawrence Livermore National Lab, Livermore, California

  • Venue:
  • EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We apply two new automated semantic evaluations to three distinct latent topic models. Both metrics have been shown to align with human evaluations and provide a balance between internal measures of information gain and comparisons to human ratings of coherent topics. We improve upon the measures by introducing new aggregate measures that allows for comparing complete topic models. We further compare the automated measures to other metrics for topic models, comparison to manually crafted semantic tests and document classification. Our experiments reveal that LDA and LSA each have different strengths; LDA best learns descriptive topics while LSA is best at creating a compact semantic representation of documents and words in a corpus.