Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The Journal of Machine Learning Research
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Efficient topic-based unsupervised name disambiguation
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Design challenges and misconceptions in named entity recognition
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
CU-COMSEM: exploring rich features for unsupervised web personal name disambiguation
SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Exploiting Web querying for Web people search
ACM Transactions on Database Systems (TODS)
Name discrimination by clustering similar contexts
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
A Unified Probabilistic Framework for Name Disambiguation in Digital Library
IEEE Transactions on Knowledge and Data Engineering
Hi-index | 0.00 |
Searching for information about people in search engines is a common and straightforward task that is often hampered by name ambiguities. While users are interested in information about a single person, results pages usually comprise many persons with the same name. There are several approaches to tackle personal name disambiguation; however, it is still a challenge to understand the impact of each approach alone. In this paper, we present a plugin-based framework that aims to compare and to identify the most promising approaches for name disambiguation. This framework enabled us to merge different approaches to find good combinations for this task and to compare state-of-the-art solutions using a common dataset. Preliminary results support the greater impact of biographical information to aid in clustering, the use of comprehensive texts instead of only metadata and TF-IDF instead of more complex approaches.