Web Information Organization Using Keyword Distillation Based Clustering

  • Authors:
  • Tomohide Shibata;Yasuo Bamba;Keiji Shinzato;Sadao Kurohashi

  • Affiliations:
  • -;-;-;-

  • Venue:
  • WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a system that conducts search result clustering for several thousands of Web pages, and elaborates cluster labels through keyword distillation. Keyword distillation is a method that properly handles spelling variations, transliterations, synonyms, inclusion relations and word ambiguity, using linguistic resources and contexts of a user's query. The system provides a clustering result from 1,000 pages in less than one minute by taking advantage of a search engine infrastructure and grid computing environment. Experimental results show that the system correctly merged synonymous keywords and is useful for finding topics hidden in the lower-ranked pages in a search result.