Clustering-based relevance feedback for web pages

Authors:
Seung Yeol Yoo;Achim Hoffmann
Affiliations:
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia;School of Computer Science and Engineering, University of New South Wales, Sydney, NSW, Australia
Venue:
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Year:
2006

Citing 7
Cited 1

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving pseudo-relevance feedback in web information retrieval using web page segmentation

WWW '03 Proceedings of the 12th international conference on World Wide Web
DOM-based content extraction of HTML documents

WWW '03 Proceedings of the 12th international conference on World Wide Web
Learning block importance models for web pages

Proceedings of the 13th international conference on World Wide Web
Block-based web search

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Extracting content structure for web pages based on visual representation

APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications

Subjectively Related Association Term Discovery towards Personalized Web Information Retrieval

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most traditional relevance feedback systems simply choose the top ranked Web pages as the source of providing the weights of candidate query expansion terms. However, the contents of such topranked Web pages is often composed of heterogeneous sub-topics which can be and should be recognized and distinguished. However, current approaches treat retrieved Web pages as one unit and often fail to extract good quality candidate query expansion terms. In this paper, our basic idea is that the Web pages properly clustered into a sub-topic cluster can be used as a better source than whole given Web pages, to provide more topically coherent relevance feedback for that specific sub-topic. Thus, we propose Clustering-Based Relevance Feedback for Web Pages, which utilizes three methods to cluster retrieved Web pages into several subtopic-clusters. These three methods cooperate to construct good quality clusters by respectively supporting Web page Segmentation, Term Selection, k Seed Centroid Selection. Here, the automatically selected terms indicate the relevance feedback to construct all sub-topic clusters and assign the given Web pages to proper clusters. Each subset of the selected terms, which occurs in theWeb pages assigned into a sub-topic cluster, indicates the relevance feedback to expand a query over that sub-topic cluster. Our experimental results showed that the clustering performances based on two traditional term-weighting methods (i.e., an unsupervised method and a supervised method) can be significantly improved with our methods.