Result disambiguation in web people search

Authors:
Richard Berendsen;Bogomil Kovachev;Evangelia-Paraskevi Nastou;Maarten de Rijke;Wouter Weerkamp
Affiliations:
ISLA, University of Amsterdam, Amsterdam, The Netherlands;ISLA, University of Amsterdam, Amsterdam, The Netherlands;ISLA, University of Amsterdam, Amsterdam, The Netherlands;ISLA, University of Amsterdam, Amsterdam, The Netherlands;ISLA, University of Amsterdam, Amsterdam, The Netherlands
Venue:
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Year:
2012

Citing 17
Cited 1

Improving Web Clustering by Cluster Selection

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
A scalable algorithm for high-quality clustering of web snippets

Proceedings of the 2006 ACM symposium on Applied computing
A New Web Search Result Clustering based on True Common Phrase Label Discovery

CIMCA '06 Proceedings of the International Conference on Computational Inteligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce
Espresso: leveraging generic patterns for automatically harvesting semantic relations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
A new algorithm for clustering search results

Data & Knowledge Engineering
A comparison of statistical significance tests for information retrieval evaluation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
A personalized search engine based on Web-snippet hierarchical clustering

Software—Practice & Experience
Improving Web Search by Categorization, Clustering, and Personalization

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
A survey of Web clustering engines

ACM Computing Surveys (CSUR)
A comparison of extrinsic clustering evaluation metrics based on formal constraints

Information Retrieval
A comparison of retrieval-based hierarchical clustering approaches to person name disambiguation

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
The SemEval-2007 WePS evaluation: establishing a benchmark for the web people search task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Person name disambiguation by bootstrapping

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
People searching for people: analysis of a people search engine log

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
From names to entities using thematic context distance

Proceedings of the 20th ACM international conference on Information and knowledge management
Combining evaluation metrics via the unanimous improvement ratio and its application to clustering tasks

Journal of Artificial Intelligence Research
Scalable clustering methods for the name disambiguation problem

Knowledge and Information Systems

Expertise Retrieval

Foundations and Trends in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of disambiguating the results of a web people search engine: given a query consisting of a person name plus the result pages for this query, find correct referents for all mentions by clustering the pages according to the different people sharing the name. While the problem has been studied extensively, we discover that the increasing availability of results retrieved from social media platforms causes state-of-the-art methods to break down. We analyze the problem and propose a dual strategy where we distinguish between results obtained from social media platforms and those obtained from other sources. In our dual strategy, the two types of documents are disambiguated separately, using different strategies, and their results are then merged. We study several instantiations for the different stages in our proposed strategy and manage to achieve state-of-the-art performance.