Clustering-based relevance feedback for web pages

  • Authors:
  • Seung Yeol Yoo;Achim Hoffmann

  • Affiliations:
  • School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia;School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia

  • Venue:
  • PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most traditional relevance feedback systems simply choose the top ranked Web pages as the source of providing the weights of candidate query expansion terms. However, the contents of such topranked Web pages is often composed of heterogeneous sub-topics which can be and should be recognized and distinguished. However, current approaches treat retrieved Web pages as one unit and often fail to extract good quality candidate query expansion terms. In this paper, our basic idea is that the Web pages properly clustered into a sub-topic cluster can be used as a better source than whole given Web pages, to provide more topically coherent relevance feedback for that specific sub-topic. Thus, we propose Clustering-Based Relevance Feedback for Web Pages, which utilizes three methods to cluster retrieved Web pages into several subtopic-clusters. These three methods cooperate to construct good quality clusters by respectively supporting Web page Segmentation, Term Selection, k Seed Centroid Selection. Here, the automatically selected terms indicate the relevance feedback to construct all sub-topic clusters and assign the given Web pages to proper clusters. Each subset of the selected terms, which occurs in theWeb pages assigned into a sub-topic cluster, indicates the relevance feedback to expand a query over that sub-topic cluster. Our experimental results showed that the clustering performances based on two traditional term-weighting methods (i.e., an unsupervised method and a supervised method) can be significantly improved with our methods.