Towards a fair comparison between name disambiguation approaches

Authors:
João Guerreiro;Daniel Gonçalves;David Martins de Matos
Affiliations:
Technical University of Lisbon, Lisboa, Portugal;Technical University of Lisbon, Lisboa, Portugal;Technical University of Lisbon, Lisboa, Portugal
Venue:
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Year:
2013

Citing 10
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unsupervised personal name disambiguation

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
CU-COMSEM: exploring rich features for unsupervised web personal name disambiguation

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Exploiting Web querying for Web people search

ACM Transactions on Database Systems (TODS)
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
A Unified Probabilistic Framework for Name Disambiguation in Digital Library

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching for information about people in search engines is a common and straightforward task that is often hampered by name ambiguities. While users are interested in information about a single person, results pages usually comprise many persons with the same name. There are several approaches to tackle personal name disambiguation; however, it is still a challenge to understand the impact of each approach alone. In this paper, we present a plugin-based framework that aims to compare and to identify the most promising approaches for name disambiguation. This framework enabled us to merge different approaches to find good combinations for this task and to compare state-of-the-art solutions using a common dataset. Preliminary results support the greater impact of biographical information to aid in clustering, the use of comprehensive texts instead of only metadata and TF-IDF instead of more complex approaches.