Discovering emerging entities with ambiguous names

Authors:
Johannes Hoffart;Yasemin Altun;Gerhard Weikum
Affiliations:
Max Planck Institute for Informatics, Saarbrücken, Germany;Google Inc., Zürich, Switzerland;Max Planck Institute for Informatics, Saarbrücken, Germany
Venue:
Proceedings of the 23rd international conference on World wide web
Year:
2014

Citing 18
Cited 0

KEA: practical automatic keyphrase extraction

Proceedings of the fourth ACM conference on Digital libraries
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining document collections to facilitate accurate approximate entity matching

Proceedings of the VLDB Endowment
Automatic Discovery of Personal Name Aliases from the Web

IEEE Transactions on Knowledge and Data Engineering
Local and global algorithms for disambiguation to Wikipedia

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Fast and Accurate Annotation of Short Texts with Wikipedia Pages

IEEE Software
Robust disambiguation of named entities in text

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Towards alias detection without string similarity: an active learning based approach

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
No noun phrase left behind: detecting and typing unlinkable entities

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
KORE: keyphrase overlap relatedness for entity disambiguation

Proceedings of the 21st ACM international conference on Information and knowledge management
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence
Evaluating Entity Linking with Wikipedia

Artificial Intelligence
Mining evidences for named entity disambiguation

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A framework for benchmarking entity-annotation systems

Proceedings of the 22nd international conference on World Wide Web
Gem-based entity-knowledge maintenance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Knowledge bases (KB's) contain data about a large number of people, organizations, and other entities. However, this knowledge can never be complete due to the dynamics of the ever-changing world: new companies are formed every day, new songs are composed every minute and become of interest for addition to a KB. To keep up with the real world's entities, the KB maintenance process needs to continuously discover newly emerging entities in news and other Web streams. In this paper we focus on the most difficult case where the names of new entities are ambiguous. This raises the technical problem to decide whether an observed name refers to a known entity or represents a new entity. This paper presents a method to solve this problem with high accuracy. It is based on a new model of measuring the confidence of mapping an ambiguous mention to an existing entity, and a new model of representing a new entity with the same ambiguous name as a set of weighted keyphrases. The method can handle both Wikipedia-derived entities that typically constitute the bulk of large KB's as well as entities that exist only in other Web sources such as online communities about music or movies. Experiments show that our entity discovery method outperforms previous methods for coping with out-of-KB entities (called unlinkable in entity linking).