An Evaluation of Statistical Approaches to Text Categorization
Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Optimizing search engines using clickthrough data
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation
WWW '03 Proceedings of the 12th international conference on World Wide Web
The Journal of Machine Learning Research
Entity-based cross-document coreferencing using the Vector Space Model
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Relational clustering for multi-type entity resolution
MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
YAGO: A Large Ontology from Wikipedia and WordNet
Web Semantics: Science, Services and Agents on the World Wide Web
What's in Wikipedia?: mapping topics and conflict using socially annotated category structure
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
More influence means less work: fast latent dirichlet allocation by influence scheduling
Proceedings of the 20th ACM international conference on Information and knowledge management
Ontology-driven automatic entity disambiguation in unstructured text
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Result disambiguation in web people search
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A knowledge-extraction approach to identify and present verbatim quotes in free text
Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Semantic to intelligent web era: building blocks, applications, and current trends
Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Hi-index | 0.00 |
Name ambiguity arises from the polysemy of names and causes uncertainty about the true identity of entities referenced in unstructured text. This is a major problem in areas like information retrieval or knowledge management, for example when searching for a specific entity or updating an existing knowledge base. We approach this problem of named entity disambiguation (NED) using thematic information derived from Latent Dirichlet Allocation (LDA) to compare the entity mention's context with candidate entities in Wikipedia represented by their respective articles. We evaluate various distances over topic distributions in a supervised classification setting to find the best suited candidate entity, which is either covered in Wikipedia or unknown. We compare our approach to a state of the art method and show that it achieves significantly better results in predictive performance, regarding both entities covered in Wikipedia as well as uncovered entities. We show that our approach is in general language independent as we obtain equally good results for named entity disambiguation using the English, the German and the French Wikipedia.