A new vector space model exploiting semantic correlations of social annotations for web page clustering

  • Authors:
  • Xiwu Gu;Xianbing Wang;Ruixuan Li;Kunmei Wen;Yufei Yang;Weijun Xiao

  • Affiliations:
  • Intelligent and Distributed Computing Lab, College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, P.R. China;Intelligent and Distributed Computing Lab, College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, P.R. China;Intelligent and Distributed Computing Lab, College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, P.R. China;Intelligent and Distributed Computing Lab, College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, P.R. China;Intelligent and Distributed Computing Lab, College of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, P.R. China;Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN

  • Venue:
  • WAIM'11 Proceedings of the 12th international conference on Web-age information management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Text clustering can effectively improve search results and user experience of information retrieval system. Traditional text clustering approaches are based on vector space model, in which a document is represented as a vector using term frequency based weighting scheme. The main disadvantage of this model is that it cannot fully exploit semantic correlations between social annotations and document contents because term frequency based weighting scheme only captures the number of occurrences of terms in the document. However, social annotation of web pages implicates fundamental and valuable semantic information thus can be fully utilized to improve information retrieval system. In this paper, we investigate and evaluate several extended vector space models which can combine social annotation and web page text. In particular, we propose a novel vector space model by computing the semantic correlations between social annotations and web page words. Comparing with other vector space models, our experiments show that using semantic correlations between social tags and web page words improves the clustering accuracy with RI score increase of 4% - 7%.