Entity ranking using Wikipedia as a pivot

Authors:
Rianne Kaptein;Pavel Serdyukov;Arjen De Vries;Jaap Kamps
Affiliations:
University of Amsterdam, Amsterdam, Netherlands;Delft University of Technology, Delft, Netherlands;Delft University of Technology and Centrum Wiskunde & Informatica, Delft, Netherlands;University of Amsterdam, Amsterdam, Netherlands
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 20
Cited 11

A system for discovering relationships by feature extraction from text databases

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Design and implementation of the UIMA common analysis system

IBM Systems Journal
The Wikipedia XML corpus

ACM SIGIR Forum
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
ESTER: efficient search on text, entities, and relations

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Proximity-based document representation for named entity retrieval

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Ranking very many typed entities on wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Entity ranking in Wikipedia

Proceedings of the 2008 ACM symposium on Applied computing
Inferring the most important types of a query: a semantic approach

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Overview of the INEX 2007 Entity Ranking Track

Focused Access to XML Documents
NAGA: Searching and Ranking Knowledge

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
An evaluation of entity and frequency based query completion methods

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Using wikipedia categories for ad hoc search

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Overview of the INEX 2008 Entity Ranking Track

Advances in Focused Retrieval
Determining expert profiles (with an application to expert finding)

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A signal-to-noise approach to score normalization

Proceedings of the 18th ACM conference on Information and knowledge management
Why finding entities in Wikipedia is difficult, sometimes

Information Retrieval
Overview of the INEX 2009 entity ranking track

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Category-based query modeling for entity search

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Word clouds of multiple search results

IRFC'11 Proceedings of the Second international conference on Multidisciplinary information retrieval facility
Bipartite Graph Based Entity Ranking for Related Entity Finding

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Expertise Retrieval

Foundations and Trends in Information Retrieval
Combining inverted indices and structured search for ad-hoc object retrieval

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Hierarchical target type identification for entity-oriented queries

Proceedings of the 21st ACM international conference on Information and knowledge management
Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
Exploiting the category structure of Wikipedia for entity ranking

Artificial Intelligence
An exploration of ranking models and feedback method for related entity finding

Information Processing and Management: an International Journal
Towards an enhanced and adaptable ontology by distilling and assembling online encyclopedias

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Structured positional entity language model for enterprise entity retrieval

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
The parallel path framework for entity discovery on the web

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we investigate the task of Entity Ranking on the Web. Searchers looking for entities are arguably better served by presenting a ranked list of entities directly, rather than a list of web pages with relevant but also potentially redundant information about these entities. Since entities are represented by their web homepages, a naive approach to entity ranking is to use standard text retrieval. Our experimental results clearly demonstrate that text retrieval is effective at finding relevant pages, but performs poorly at finding entities. Our proposal is to use Wikipedia as a pivot for finding entities on the Web, allowing us to reduce the hard web entity ranking problem to easier problem of Wikipedia entity ranking. Wikipedia allows us to properly identify entities and some of their characteristics, and Wikipedia's elaborate category structure allows us to get a handle on the entity's type. Our main findings are the following. Our first finding is that, in principle, the problem of web entity ranking can be reduced to Wikipedia entity ranking. We found that the majority of entity ranking topics in our test collections can be answered using Wikipedia, and that with high precision relevant web entities corresponding to the Wikipedia entities can be found using Wikipedia's 'external links'. Our second finding is that we can exploit the structure of Wikipedia to improve entity ranking effectiveness. Entity types are valuable retrieval cues in Wikipedia. Automatically assigned entity types are effective, and almost as good as manually assigned types. Our third finding is that web entity retrieval can be significantly improved by using Wikipedia as a pivot. Both Wikipedia's external links and the enriched Wikipedia entities with additional links to homepages are significantly better at finding primary web homepages than anchor text retrieval, which in turn significantly improved over standard text retrieval.