Automatic labelling of topic models

Authors:
Jey Han Lau;Karl Grieser;David Newman;Timothy Baldwin
Affiliations:
NICTA Victoria Research Laboratory and University of Melbourne;University of Melbourne;NICTA Victoria Research Laboratory and University of California Irvine;NICTA Victoria Research Laboratory and University of Melbourne
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Year:
2011

Citing 22
Cited 7

Cumulated gain-based evaluation of IR techniques

ACM Transactions on Information Systems (TOIS)
Latent dirichlet allocation

The Journal of Machine Learning Research
Applied morphological processing of English

Natural Language Engineering
A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Proceedings of the 15th international conference on World Wide Web
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Training linear SVMs in linear time

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Modeling online reviews with multi-grain topic models

Proceedings of the 17th international conference on World Wide Web
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
Bayesian word sense induction

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploring content models for multi-document summarization

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Automatic Labeling of Topics

ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Invited paper: Visualizing search results and document collections using topic maps

Web Semantics: Science, Services and Agents on the World Wide Web
Automatic evaluation of topic coherence

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Topic models for image annotation and text illustration

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Using ontological and document similarity to estimate museum exhibit relatedness

Journal on Computing and Cultural Heritage (JOCCH)
Best topic word selection for topic labelling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

CITOM: An incremental construction of multilingual topic maps

Data & Knowledge Engineering
Open domain event extraction from twitter

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic labeling hierarchical topics

Proceedings of the 21st ACM international conference on Information and knowledge management
Unsupervised graph-based topic labelling using dbpedia

Proceedings of the sixth ACM international conference on Web search and data mining
Beyond term clusters: assigning Wikipedia concepts to scientific documents

Proceedings of the 2013 ACM symposium on Document engineering
Topic segmentation and labeling in asynchronous conversations

Journal of Artificial Intelligence Research
Timeline generation: tracking individuals on twitter

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a method for automatically labelling topics learned via LDA topic models. We generate our label candidate set from the top-ranking topic terms, titles of Wikipedia articles containing the top-ranking topic terms, and sub-phrases extracted from the Wikipedia article titles. We rank the label candidates using a combination of association measures and lexical features, optionally fed into a supervised ranking model. Our method is shown to perform strongly over four independent sets of topics, significantly better than a benchmark method.