A vector space model for automatic indexing
Communications of the ACM
A probabilistic model of information retrieval: development and comparative experiments
Information Processing and Management: an International Journal
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Automatic Information Organization and Retrieval.
Automatic Information Organization and Retrieval.
Geographic Named Entity Disambiguation with Automatic Profile Generation
WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Wikify!: linking documents to encyclopedic knowledge
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval
Introduction to Information Retrieval
Named entity disambiguation by leveraging wikipedia semantic knowledge
Proceedings of the 18th ACM conference on Information and knowledge management
Entity disambiguation for knowledge base population
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Local and global algorithms for disambiguation to Wikipedia
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Entity matching: how similar is similar
Proceedings of the VLDB Endowment
DBpedia spotlight: shedding light on the web of documents
Proceedings of the 7th International Conference on Semantic Systems
Model Selection Strategies for Author Disambiguation
DEXA '11 Proceedings of the 2011 22nd International Workshop on Database and Expert Systems Applications
Information retrieval and deduplication for tourism recommender sightsplanner
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Hi-index | 0.00 |
Entity Disambiguation has been studied extensively in the last 10 years with authors reporting increasingly well performing systems. However, most studies focused on general purpose knowledge bases like Wikipedia or DBPedia and left out the question how those results generalize to more specialized domains. This is especially important in the context of Linked Open Data which forms an enormous resource for disambiguation. However, the influence of domain heterogeneity, size and quality of the knowledge base remains largely unanswered. In this paper we present an extensive set of experiments on special purpose knowledge bases from the biomedical domain where we evaluate the disambiguation performance along four variables: (i) the representation of the knowledge base as being either entity-centric or document-centric, (ii) the size of the knowledge base in terms of entities covered, (iii) the semantic heterogeneity of a domain and (iv) the quality and completeness of a knowledge base. Our results show that for special purpose knowledge bases (i) document-centric disambiguation significantly outperforms entity-centric disambiguation, (ii) document-centric disambiguation does not depend on the size of the knowledge-base, while entity-centric approaches do, and (iii) disambiguation performance varies greatly across domains. These results suggest that domain-heterogeneity, size and knowledge base quality have to be carefully considered for the design of entity disambiguation systems and that for constructing knowledge bases user-annotated texts are preferable to carefully constructed knowledge bases.