Unsupervised graph-based topic labelling using dbpedia

Authors:
Ioana Hulpus;Conor Hayes;Marcel Karnstedt;Derek Greene
Affiliations:
Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, Galway, Ireland;Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, Galway, Ireland;Digital Enterprise Research Institute (DERI), National University of Ireland, Galway, Galway, Ireland;School of Computer Science and Informatics, University College Dublin, Dublin, Ireland
Venue:
Proceedings of the sixth ACM international conference on Web search and data mining
Year:
2013

Citing 15
Cited 3

Latent dirichlet allocation

The Journal of Machine Learning Research
Practical solutions to the problem of diagonal dominance in kernel document clustering

ICML '06 Proceedings of the 23rd international conference on Machine learning
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Automatically labeling hierarchical clusters

dg.o '06 Proceedings of the 2006 international conference on Digital government research
Automatic labeling of multinomial topic models

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Enhancing cluster labeling using wikipedia

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Using encyclopedic knowledge for automatic topic identification

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
DBpedia - A crystallization point for the Web of Data

Web Semantics: Science, Services and Agents on the World Wide Web
Automatic Labeling of Topics

ISDA '09 Proceedings of the 2009 Ninth International Conference on Intelligent Systems Design and Applications
Networks: An Introduction

Networks: An Introduction
Analysis of structural relationships for hierarchical cluster labeling

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Automatic labelling of topic models

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
WikiLabel: an encyclopedic approach to labeling documents en masse

Proceedings of the 20th ACM international conference on Information and knowledge management
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Exploiting DBpedia for web search results clustering

Proceedings of the 2013 workshop on Automated knowledge base construction
Knowledge-based graph document modeling

Proceedings of the 7th ACM international conference on Web search and data mining
Effective named entity recognition for idiosyncratic web collections

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automated topic labelling brings benefits for users aiming at analysing and understanding document collections, as well as for search engines targetting at the linkage between groups of words and their inherent topics. Current approaches to achieve this suffer in quality, but we argue their performances might be improved by setting the focus on the structure in the data. Building upon research for concept disambiguation and linking to DBpedia, we are taking a novel approach to topic labelling by making use of structured data exposed by DBpedia. We start from the hypothesis that words co-occuring in text likely refer to concepts that belong closely together in the DBpedia graph. Using graph centrality measures, we show that we are able to identify the concepts that best represent the topics. We comparatively evaluate our graph-based approach and the standard text-based approach, on topics extracted from three corpora, based on results gathered in a crowd-sourcing experiment. Our research shows that graph-based analysis of DBpedia can achieve better results for topic labelling in terms of both precision and topic coverage.