A framework for benchmarking entity-annotation systems

Authors:
Marco Cornolti;Paolo Ferragina;Massimiliano Ciaramita
Affiliations:
University of Pisa, Pisa, Italy;University of Pisa, Pisa, Italy;Google Research, Zuerich, Switzerland
Venue:
Proceedings of the 22nd international conference on World Wide Web
Year:
2013

Citing 18
Cited 4

Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Introduction to Information Retrieval

Introduction to Information Retrieval
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Collective annotation of Wikipedia entities in web text

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Mining meaning from Wikipedia

International Journal of Human-Computer Studies
Wikipedia-based semantic interpretation for natural language processing

Journal of Artificial Intelligence Research
TAGME: on-the-fly annotation of short text fragments (by wikipedia entities)

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
YAGO2: exploring and querying world knowledge in time, space, context, and many languages

Proceedings of the 20th international conference companion on World wide web
Local and global algorithms for disambiguation to Wikipedia

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
DBpedia spotlight: shedding light on the web of documents

Proceedings of the 7th International Conference on Semantic Systems
Fast and Accurate Annotation of Short Texts with Wikipedia Pages

IEEE Software
Topical clustering of search results

Proceedings of the fifth ACM international conference on Web search and data mining
Adding semantics to microblog posts

Proceedings of the fifth ACM international conference on Web search and data mining
Robust disambiguation of named entities in text

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Classification of short texts by deploying topical annotations

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
From machu_picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph

Proceedings of the sixth ACM international conference on Web search and data mining
Wiki3C: exploiting wikipedia for context-aware concept categorization

Proceedings of the sixth ACM international conference on Web search and data mining

Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Exploiting DBpedia for web search results clustering

Proceedings of the 2013 workshop on Automated knowledge base construction
Knowledge-based graph document modeling

Proceedings of the 7th ACM international conference on Web search and data mining
Discovering emerging entities with ambiguous names

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we design and implement a benchmarking framework for fair and exhaustive comparison of entity-annotation systems. The framework is based upon the definition of a set of problems related to the entity-annotation task, a set of measures to evaluate systems performance, and a systematic comparative evaluation involving all publicly available datasets, containing texts of various types such as news, tweets and Web pages. Our framework is easily-extensible with novel entity annotators, datasets and evaluation measures for comparing systems, and it has been released to the public as open source. We use this framework to perform the first extensive comparison among all available entity annotators over all available datasets, and draw many interesting conclusions upon their efficiency and effectiveness. We also draw conclusions between academic versus commercial annotators.