Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
WWW '03 Proceedings of the 12th international conference on World Wide Web
DOM-based content extraction of HTML documents
WWW '03 Proceedings of the 12th international conference on World Wide Web
Learning block importance models for web pages
Proceedings of the 13th international conference on World Wide Web
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Subjectively Related Association Term Discovery towards Personalized Web Information Retrieval
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Hi-index | 0.00 |
Most traditional relevance feedback systems simply choose the top ranked Web pages as the source of providing the weights of candidate query expansion terms. However, the contents of such topranked Web pages is often composed of heterogeneous sub-topics which can be and should be recognized and distinguished. However, current approaches treat retrieved Web pages as one unit and often fail to extract good quality candidate query expansion terms. In this paper, our basic idea is that the Web pages properly clustered into a sub-topic cluster can be used as a better source than whole given Web pages, to provide more topically coherent relevance feedback for that specific sub-topic. Thus, we propose Clustering-Based Relevance Feedback for Web Pages, which utilizes three methods to cluster retrieved Web pages into several subtopic-clusters. These three methods cooperate to construct good quality clusters by respectively supporting Web page Segmentation, Term Selection, k Seed Centroid Selection. Here, the automatically selected terms indicate the relevance feedback to construct all sub-topic clusters and assign the given Web pages to proper clusters. Each subset of the selected terms, which occurs in theWeb pages assigned into a sub-topic cluster, indicates the relevance feedback to expand a query over that sub-topic cluster. Our experimental results showed that the clustering performances based on two traditional term-weighting methods (i.e., an unsupervised method and a supervised method) can be significantly improved with our methods.