Automatic evaluation of topic coherence

Authors:
David Newman;Jey Han Lau;Karl Grieser;Timothy Baldwin
Affiliations:
NICTA Victoria Research Laboratory, Australia and University of California, Irvine;University of Melbourne, Australia;University of Melbourne, Australia;NICTA Victoria Research Laboratory, Australia and University of Melbourne, Australia
Venue:
HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Year:
2010

Citing 23
Cited 31

Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Latent dirichlet allocation

The Journal of Machine Learning Research
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Applying discrete PCA in data analysis

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Evaluating WordNet-based Measures of Lexical Semantic Relatedness

Computational Linguistics
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Evaluation methods for topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Text segmentation with LDA-based Fisher kernel

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
Using LDA to detect semantically incoherent documents

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Bayesian word sense induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
A study on similarity and relatedness using distributional and WordNet-based approaches

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Non-parametric Bayesian areal linguistics

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Evaluating topic models for digital libraries

Proceedings of the 10th annual joint conference on Digital libraries

Evaluating topic models for digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Best topic word selection for topic labelling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Measuring historical word sense variation

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Interactive topic modeling

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic labelling of topic models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Latent topic feedback for information retrieval

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Simultaneous joint and conditional modeling of documents tagged from two perspectives

Proceedings of the 20th ACM international conference on Information and knowledge management
TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling

ACM Transactions on Intelligent Systems and Technology (TIST)
Bayesian checking for topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A new sentence compression dataset and its use in an abstractive generate-and-rank sentence compressor

UCNLG+EVAL '11 Proceedings of the UCNLG+Eval: Language Generation and Evaluation Workshop
Topic analysis for online reviews with an author-experience-object-topic model

AIRS'11 Proceedings of the 7th Asia conference on Information Retrieval Technology
Improving topic evaluation using conceptual knowledge

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Open domain event extraction from twitter

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
SurfShop: combing a product ontology with topic model results for online window-shopping

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session
Topic extraction based on prior knowledge obtained from target documents

ACL '12 Proceedings of ACL 2012 Student Research Workshop
Modelling sequential text with an adaptive topic model

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Artificial Intelligence
Similarity measures based on latent dirichlet allocation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
On collocations and topic models

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2
Improving LDA topic models for microblogs via tweet pooling and automatic labeling

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
One theme in all views: modeling consensus topics in multiple contexts

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
When relevance is not enough: promoting diversity and freshness in personalized question recommendation

Proceedings of the 22nd international conference on World Wide Web
Unsupervised latent concept modeling to identify query facets

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Experiments with semantic similarity measures based on LDA and LSA

SLSP'13 Proceedings of the First international conference on Statistical Language and Speech Processing
A Graph Analytical Approach for Topic Detection

ACM Transactions on Internet Technology (TOIT)
Semantic smoothing for text clustering

Knowledge-Based Systems
The dual-sparse topic model: mining focused topics and focused terms in short text

Proceedings of the 23rd international conference on World wide web
A time-based collective factorization for topic discovery and monitoring in news

Proceedings of the 23rd international conference on World wide web
Identifying interesting Twitter contents using topical analysis

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces the novel task of topic coherence evaluation, whereby a set of words, as generated by a topic model, is rated for coherence or interpretability. We apply a range of topic scoring models to the evaluation task, drawing on WordNet, Wikipedia and the Google search engine, and existing research on lexical similarity/relatedness. In comparison with human scores for a set of learned topics over two distinct datasets, we show a simple co-occurrence measure based on pointwise mutual information over Wikipedia data is able to achieve results for the task at or nearing the level of inter-annotator correlation, and that other Wikipedia-based lexical relatedness methods also achieve strong results. Google produces strong, if less consistent, results, while our results over WordNet are patchy at best.