From names to entities using thematic context distance

Authors:
Anja Pilz;Gerhard Paaß
Affiliations:
Fraunhofer IAIS, St. Augustin, Germany;Fraunhofer IAIS, St. Augustin, Germany
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 11
Cited 3

An Evaluation of Statistical Approaches to Text Categorization

Information Retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
SemTag and seeker: bootstrapping the semantic web via automated semantic annotation

WWW '03 Proceedings of the 12th international conference on World Wide Web
Latent dirichlet allocation

The Journal of Machine Learning Research
Entity-based cross-document coreferencing using the Vector Space Model

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Relational clustering for multi-type entity resolution

MRDM '05 Proceedings of the 4th international workshop on Multi-relational mining
YAGO: A Large Ontology from Wikipedia and WordNet

Web Semantics: Science, Services and Agents on the World Wide Web
What's in Wikipedia?: mapping topics and conflict using socially annotated category structure

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
More influence means less work: fast latent dirichlet allocation by influence scheduling

Proceedings of the 20th ACM international conference on Information and knowledge management
Ontology-driven automatic entity disambiguation in unstructured text

ISWC'06 Proceedings of the 5th international conference on The Semantic Web

Result disambiguation in web people search

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A knowledge-extraction approach to identify and present verbatim quotes in free text

Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
Semantic to intelligent web era: building blocks, applications, and current trends

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Name ambiguity arises from the polysemy of names and causes uncertainty about the true identity of entities referenced in unstructured text. This is a major problem in areas like information retrieval or knowledge management, for example when searching for a specific entity or updating an existing knowledge base. We approach this problem of named entity disambiguation (NED) using thematic information derived from Latent Dirichlet Allocation (LDA) to compare the entity mention's context with candidate entities in Wikipedia represented by their respective articles. We evaluate various distances over topic distributions in a supervised classification setting to find the best suited candidate entity, which is either covered in Wikipedia or unknown. We compare our approach to a state of the art method and show that it achieves significantly better results in predictive performance, regarding both entities covered in Wikipedia as well as uncovered entities. We show that our approach is in general language independent as we obtain equally good results for named entity disambiguation using the English, the German and the French Wikipedia.