Exploring topic coherence over many models and many topics

Authors:
Keith Stevens;Philip Kegelmeyer;David Andrzejewski;David Buttler
Affiliations:
University of California, Los Angeles, California and Lawrence Livermore National Lab, Livermore, California;Sandia National Lab, Livermore, California;Lawrence Livermore National Lab, Livermore, California;Lawrence Livermore National Lab, Livermore, California
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 13
Cited 3

Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Latent dirichlet allocation

The Journal of Machine Learning Research
Mining citizen science data to predict orevalence of wild bird species

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A Comparison of Decision Tree Ensemble Creation Techniques

IEEE Transactions on Pattern Analysis and Machine Intelligence
On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

Computational Statistics & Data Analysis
Evaluation methods for topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bayesian word sense induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Evaluating topic models for digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
The S-Space package: an open source package for word space models

ACLDemos '10 Proceedings of the ACL 2010 System Demonstrations
Latent semantic word sense induction and disambiguation

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Latent topic feedback for information retrieval

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Pinteresting: towards a better understanding of user interests

Proceedings of the 2012 workshop on Data-driven user behavioral modelling and mining from social media
Topic models can improve domain term extraction

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Discovering coherent topics using general knowledge

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We apply two new automated semantic evaluations to three distinct latent topic models. Both metrics have been shown to align with human evaluations and provide a balance between internal measures of information gain and comparisons to human ratings of coherent topics. We improve upon the measures by introducing new aggregate measures that allows for comparing complete topic models. We further compare the automated measures to other metrics for topic models, comparison to manually crafted semantic tests and document classification. Our experiments reveal that LDA and LSA each have different strengths; LDA best learns descriptive topics while LSA is best at creating a compact semantic representation of documents and words in a corpus.