Improving topic evaluation using conceptual knowledge

Authors:
Claudiu Cristian Musat;Julien Velcin;Stefan Trausan-Matu;Marian-Andrei Rizoiu
Affiliations:
Computer Science Department, "Politehnica" University of Bucharest, Romania;ERIC Laboratoire, Université Lumière, Lyon 2, France;Computer Science Department, "Politehnica" University of Bucharest, Romania;ERIC Laboratoire, Université Lumière, Lyon 2, France
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 11
Cited 2

WordNet: a lexical database for English

Communications of the ACM
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Topics over time: a non-Markov continuous-time model of topical trends

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Topic sentiment mixture: modeling facets and opinions in weblogs

Proceedings of the 16th international conference on World Wide Web
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Evaluation methods for topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Connections between the lines: augmenting social networks with text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic evaluation of topic coherence

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

A text scanning mechanism simulating human reading process

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Semi-supervised structuring of complex data

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The growing number of statistical topic models led to the need to better evaluate their output. Traditional evaluation means estimate the model's fitness to unseen data. It has recently been proven than the output of human judgment can greatly differ from these measures. Thus the need for methods that better emulate human judgment is stringent. In this paper we present a system that computes the conceptual relevance of individual topics from a given model on the basis of information drawn from a given concept hierarchy, in this case WordNet. The notion of conceptual relevance is regarded as the ability to attribute a concept to each topic and separate words related to the topic from the unrelated ones based on that concept. In multiple experiments we prove the correlation between the automatic evaluation method and the answers received from human evaluators, for various corpora and difficulty levels. By changing the evaluation focus from a statistical one to a conceptual one we were able to detect which topics are conceptually meaningful and rank them accordingly.