A transduction-based approach to fuzzy clustering, relevance ranking and cluster label generation on web search results

  • Authors:
  • Takazumi Matsumoto;Edward Hung

  • Affiliations:
  • Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong;Department of Computing, Hong Kong Polytechnic University, Hung Hom, Hong Kong

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper details a modular, self-contained web search results clustering system that enhances search results by (i) performing clustering on lists of web documents returned by queries to search engines, and (ii) ranking the results and labeling the resulting clusters, by using a calculated relevance value as a degree of membership to clusters. In addition, we demonstrate an external evaluation method based on precision for comparing fuzzy clustering techniques, as well as internal measures suitable for working on non-training data. The built-in label generator uses the membership degrees and relevance values to weight the most relevant results more heavily. The membership degrees of documents to fuzzy clusters also facilitate effective detection and removal of overly similar clusters. To achieve this, our transduction-based clustering algorithm (TCA) and its fuzzy counterpart (FTCA) employ a transduction-based relevance model (TRM) to consider local relationships between each web document. Results from testing on five different real-world and synthetic datasets results show favorable results compared to established label-based clustering algorithms Suffix Tree Clustering (STC) and Lingo.