Automatic labeling of multinomial topic models

Authors:
Qiaozhu Mei;Xuehua Shen;ChengXiang Zhai
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2007

Citing 24
Cited 37

Word association norms, mutual information, and lexicography

Computational Linguistics
The use of MMR, diversity-based reranking for reordering documents and producing summaries

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Model-based feedback in the language modeling approach to information retrieval

Proceedings of the tenth international conference on Information and knowledge management
Introduction to the special issue on summarization

Computational Linguistics - Summarization
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Language Modeling for Information Retrieval

Language Modeling for Information Retrieval
Introduction to special issue on machine learning approaches to shallow parsing

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
Fast statistical parsing of noun phrases for document indexing

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Probabilistic author-topic models for information discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A cross-collection mixture model for comparative text mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
A new probabilistic model for title generation

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Discovering evolutionary theme patterns from text: an exploration of temporal text mining

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A probabilistic approach to spatiotemporal theme pattern mining on weblogs

Proceedings of the 15th international conference on World Wide Web
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A mixture model for contextual text mining

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical entity-topic models

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Diverse Topic Phrase Extraction through Latent Semantic Analysis

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing

The opposite of smoothing: a language model approach to ranking query-specific document clusters

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Automatic Web Tagging and Person Tagging Using Language Models

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
Towards Machine Learning on the Semantic Web

Uncertainty Reasoning for the Semantic Web I
Web Search Clustering and Labeling with Hidden Topics

ACM Transactions on Asian Language Information Processing (TALIP)
Understanding and summarizing answers in community-based question answering services

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Software traceability with topic modeling

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Evaluating topic models for digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
Automatic evaluation of topic coherence

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
A latent dirichlet allocation method for selectional preferences

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Not-so-latent dirichlet allocation: collapsed Gibbs sampling using human judgments

CSLDAMT '10 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
Experts' retrieval with multiword-enhanced author topic model

SS '10 Proceedings of the NAACL HLT 2010 Workshop on Semantic Search
Context modeling for ranking and tagging bursty features in text streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Finding the storyteller: automatic spoiler tagging using linguistic cues

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Exploiting structured ontology to organize scattered online opinions

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Best topic word selection for topic labelling

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Automated topic naming to support cross-project analysis of software maintenance activities

Proceedings of the 8th Working Conference on Mining Software Repositories
Topical keyphrase extraction from Twitter

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Automatic labelling of topic models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
The opposite of smoothing: a language model approach to ranking query-specific document clusters

Journal of Artificial Intelligence Research
WikiLabel: an encyclopedic approach to labeling documents en masse

Proceedings of the 20th ACM international conference on Information and knowledge management
TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling

ACM Transactions on Intelligent Systems and Technology (TIST)
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
TopicViz: interactive topic exploration in document collections

CHI '12 Extended Abstracts on Human Factors in Computing Systems
Improving topic evaluation using conceptual knowledge

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Personalized resource categorisation in folksonomies

Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Efficient mining of correlated sequential patterns based on null hypothesis

Proceedings of the 2012 international workshop on Web-scale knowledge representation, retrieval and reasoning
Conceptualizing documents with Wikipedia

Proceedings of the fifth workshop on Exploiting semantic annotations in information retrieval
Automatic labeling hierarchical topics

Proceedings of the 21st ACM international conference on Information and knowledge management
Evaluating the use of clustering for automatically organising digital library collections

TPDL'12 Proceedings of the Second international conference on Theory and Practice of Digital Libraries
Unsupervised graph-based topic labelling using dbpedia

Proceedings of the sixth ACM international conference on Web search and data mining
Enhancing biomedical concept extraction using semantic relationship weights

International Journal of Data Mining and Bioinformatics
A phrase mining framework for recursive construction of a topical hierarchy

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Beyond term clusters: assigning Wikipedia concepts to scientific documents

Proceedings of the 2013 ACM symposium on Document engineering
Topic segmentation and labeling in asynchronous conversations

Journal of Artificial Intelligence Research
Beyond cluster labeling: Semantic interpretation of clusters' contents using a graph representation

Knowledge-Based Systems
Automated topic naming

Empirical Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multinomial distributions over words are frequently used to model topics in text collections. A common, major challenge in applying all such topic models to any text mining problem is to label a multinomial topic model accurately so that a user can interpret the discovered topic. So far, such labels have been generated manually in a subjective way. In this paper, we propose probabilistic approaches to automatically labeling multinomial topic models in an objective way. We cast this labeling problem as an optimization problem involving minimizing Kullback-Leibler divergence between word distributions and maximizing mutual information between a label and a topic model. Experiments with user study have been done on two text data sets with different genres.The results show that the proposed labeling methods are quite effective to generate labels that are meaningful and useful for interpreting the discovered topic models. Our methods are general and can be applied to labeling topics learned through all kinds of topic models such as PLSA, LDA, and their variations.