Topic Significance Ranking of LDA Generative Models

Authors:
Loulwah Alsumait;Daniel Barbará;James Gentle;Carlotta Domeniconi
Affiliations:
Department of Computer Science, George Mason University, Fairfax, USA 22030;Department of Computer Science, George Mason University, Fairfax, USA 22030;Department of Computational and Data Sciences, George Mason University, Fairfax, USA 22030;Department of Computer Science, George Mason University, Fairfax, USA 22030
Venue:
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Year:
2009

Citing 6
Cited 5

A statistical learning learning model of text classification for support vector machines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
On-line LDA: Adaptive Topic Models for Mining Text Streams with Applications to Topic Detection and Tracking

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

A time-dependent topic model for multiple text streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Termite: visualization techniques for assessing textual topic models

Proceedings of the International Working Conference on Advanced Visual Interfaces
Semantic social network analysis with text corpora

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Supervised HDP using prior knowledge

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic models, like Latent Dirichlet Allocation (LDA), have been recently used to automatically generate text corpora topics, and to subdivide the corpus words among those topics. However, not all the estimated topics are of equal importance or correspond to genuine themes of the domain. Some of the topics can be a collection of irrelevant words, or represent insignificant themes. Current approaches to topic modeling perform manual examination to find meaningful topics. This paper presents the first automated unsupervised analysis of LDA models to identify junk topics from legitimate ones, and to rank the topic significance. Basically, the distance between a topic distribution and three definitions of "junk distribution" is computed using a variety of measures, from which an expressive figure of the topic significance is implemented using 4-phase Weighted Combination approach. Our experiments on synthetic and benchmark datasets show the effectiveness of the proposed approach in ranking the topic significance.