Web page clustering enhanced by summarization

Authors:
Xuanhui Wang;Dou Shen;Hua-Jun Zeng;Zheng Chen;Wei-Ying Ma
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;Tsinghua University, Beijing, P.R. China;Microsoft Research Asia, Beijing, P.R.China;Microsoft Research Asia, Beijing, P.R.China;Microsoft Research Asia, Beijing, P.R.China
Venue:
Proceedings of the thirteenth ACM international conference on Information and knowledge management
Year:
2004

Citing 3
Cited 1

Approaches to passage retrieval in full text information systems

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Passage-level evidence in document retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Generic text summarization using relevance measure and latent semantic analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval

Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

ACM Transactions on Information Systems (TOIS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional Web page clustering algorithms use the full-text in the documents to generate feature vectors. Such methods often produce unsatisfactory results because there is much noisy information, such as decoration, interaction, and advertisement, in Web pages. The varying-length problem of the Web pages is also a significant negative factor affecting the performance. In this paper, we investigate the use of several summarization techniques to tackle these issues when clustering Web pages. Compared with the full-text representation of the Web pages, our experimental results indicate that our proposed approach effectively solves the problems of noisy information and varying-length, and thus significantly boosts the clustering performance.