Exploiting the category structure of Wikipedia for entity ranking

Authors:
Rianne Kaptein;Jaap Kamps
Affiliations:
Oxyme, Cronenburg 150, 1081GN Amsterdam, The Netherlands;University of Amsterdam, Department of Media Studies, Turfdraagsterpad 9, 1012XT Amsterdam, The Netherlands
Venue:
Artificial Intelligence
Year:
2013

Citing 32
Cited 2

A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Performance issues and error analysis in an open-domain question answering system

ACM Transactions on Information Systems (TOIS)
Parsimonious language models for information retrieval

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting hierarchical relationships in conceptual search

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Exploring social annotations for the semantic web

Proceedings of the 15th international conference on World Wide Web
The Wikipedia XML corpus

ACM SIGIR Forum
Web object retrieval

Proceedings of the 16th international conference on World Wide Web
Optimizing web search using social annotations

Proceedings of the 16th international conference on World Wide Web
Using query contexts in information retrieval

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Weakly-supervised discovery of named entities using web search queries

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Entity ranking in Wikipedia

Proceedings of the 2008 ACM symposium on Applied computing
A simple and efficient sampling method for estimating AP and NDCG

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Overview of the INEX 2007 Entity Ranking Track

Focused Access to XML Documents
Using Wikipedia Categories and Links in Entity Ranking

Focused Access to XML Documents
Web-scale named entity recognition

Proceedings of the 17th ACM conference on Information and knowledge management
Search Engines: Information Retrieval in Practice

Search Engines: Information Retrieval in Practice
The Unreasonable Effectiveness of Data

IEEE Intelligent Systems
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Overview of the INEX 2008 Ad Hoc Track

Advances in Focused Retrieval
Overview of the INEX 2008 Entity Ranking Track

Advances in Focused Retrieval
How are we searching the World Wide Web? A comparison of nine search engine transaction logs

Information Processing and Management: an International Journal - Special issue: Formal methods for information retrieval
The importance of link evidence in Wikipedia

ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Inducing word senses to improve web search result clustering

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Entity ranking using Wikipedia as a pivot

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Entity ranking in Wikipedia: utilising categories, links and topic difficulty prediction

Information Retrieval
Why finding entities in Wikipedia is difficult, sometimes

Information Retrieval
Overview of the INEX 2009 entity ranking track

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Combining term-based and category-based representations for entity search

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
A recursive approach to entity ranking and list completion using entity determining terms, qualifiers and prominent n-grams

INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Investigating retrieval performance with manually-built topic models

Large Scale Semantic Access to Content (Text, Image, Video, and Sound)
Category-based query modeling for entity search

ECIR'2010 Proceedings of the 32nd European conference on Advances in Information Retrieval

Collaboratively built semi-structured content and Artificial Intelligence: The story so far

Artificial Intelligence
The cluster hypothesis for entity oriented search

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Web has not only grown in size, but also changed its character, due to collaborative content creation and an increasing amount of structure. Current Search Engines find Web pages rather than information or knowledge, and leave it to the searchers to locate the sought information within the Web page. A considerable fraction of Web searches contains named entities. We focus on how the Wikipedia structure can help rank relevant entities directly in response to a search request, rather than retrieve an unorganized list of Web pages with relevant but also potentially redundant information about these entities. Our results demonstrate the benefits of using topical and link structure over the use of shallow statistics. Our main findings are the following. First, we examine whether Wikipedia category and link structure can be used to retrieve entities inside Wikipedia as is the goal of the INEX (Initiative for the Evaluation of XML retrieval) Entity Ranking task. Category information proves to be a highly effective source of information, leading to large and significant improvements in retrieval performance on all data sets. Secondly, we study how we can use category information to retrieve documents for ad hoc retrieval topics in Wikipedia. We study the differences between entity ranking and ad hoc retrieval in Wikipedia by analyzing the relevance assessments. Considering retrieval performance, also on ad hoc retrieval topics we achieve significantly better results by exploiting the category information. Finally, we examine whether we can automatically assign target categories to ad hoc and entity ranking queries. Guessed categories lead to performance improvements that are not as large as when the categories are assigned manually, but they are still significant. We conclude that the category information in Wikipedia is a useful source of information that can be used for entity ranking as well as other retrieval tasks.