Clustering web people search results using fuzzy ants

  • Authors:
  • E. Lefever;T. Fayruzov;V. Hoste;M. De Cock

  • Affiliations:
  • LT3 Language and Translation Technology Team, University College Ghent, Groot-Brittanniëlaan 45, 9000 Gent, Belgium and Department of Applied Mathematics and Computer Science, Ghent Universit ...;Department of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281 (S9), 9000 Gent, Belgium;LT3 Language and Translation Technology Team, University College Ghent, Groot-Brittanniëlaan 45, 9000 Gent, Belgium and Department of Applied Mathematics and Computer Science, Ghent Universit ...;Department of Applied Mathematics and Computer Science, Ghent University, Krijgslaan 281 (S9), 9000 Gent, Belgium and Institute of Technology, University of Washington, Tacoma, WA-98402, USA

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2010

Quantified Score

Hi-index 0.07

Visualization

Abstract

Person name queries often bring up web pages that correspond to individuals sharing the same name. The Web People Search (WePS) task consists of organizing search results for ambiguous person name queries into meaningful clusters, with each cluster referring to one individual. This paper presents a fuzzy ant based clustering approach for this multi-document person name disambiguation problem. The main advantage of fuzzy ant based clustering, a technique inspired by the behavior of ants clustering dead nestmates into piles, is that no specification of the number of output clusters is required. This makes the algorithm very well suited for the Web Person Disambiguation task, where we do not know in advance how many individuals each person name refers to. We compare our results with state-of-the-art partitional and hierarchical clustering approaches (k-means and Agnes) and demonstrate favorable results. This is particularly interesting as the latter involve manual setting of a similarity threshold, or estimating the number of clusters in advance, while the fuzzy ant based clustering algorithm does not.