Exploiting the category structure of Wikipedia for entity ranking

  • Authors:
  • Rianne Kaptein;Jaap Kamps

  • Affiliations:
  • Oxyme, Cronenburg 150, 1081GN Amsterdam, The Netherlands;University of Amsterdam, Department of Media Studies, Turfdraagsterpad 9, 1012XT Amsterdam, The Netherlands

  • Venue:
  • Artificial Intelligence
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web has not only grown in size, but also changed its character, due to collaborative content creation and an increasing amount of structure. Current Search Engines find Web pages rather than information or knowledge, and leave it to the searchers to locate the sought information within the Web page. A considerable fraction of Web searches contains named entities. We focus on how the Wikipedia structure can help rank relevant entities directly in response to a search request, rather than retrieve an unorganized list of Web pages with relevant but also potentially redundant information about these entities. Our results demonstrate the benefits of using topical and link structure over the use of shallow statistics. Our main findings are the following. First, we examine whether Wikipedia category and link structure can be used to retrieve entities inside Wikipedia as is the goal of the INEX (Initiative for the Evaluation of XML retrieval) Entity Ranking task. Category information proves to be a highly effective source of information, leading to large and significant improvements in retrieval performance on all data sets. Secondly, we study how we can use category information to retrieve documents for ad hoc retrieval topics in Wikipedia. We study the differences between entity ranking and ad hoc retrieval in Wikipedia by analyzing the relevance assessments. Considering retrieval performance, also on ad hoc retrieval topics we achieve significantly better results by exploiting the category information. Finally, we examine whether we can automatically assign target categories to ad hoc and entity ranking queries. Guessed categories lead to performance improvements that are not as large as when the categories are assigned manually, but they are still significant. We conclude that the category information in Wikipedia is a useful source of information that can be used for entity ranking as well as other retrieval tasks.